Hi, I considererd that it would be finally time to get started with the state management rewrite :-) Just a mail with the final plan, if anyone has comments.
I will move the functions applying the states into a new file, opengl_utils.c(name inspired by old ddraw). I don't want to put them into drawprim.c because that file is already quite long.
For the start I will manage changed states per device, and not per context because for per context tracking I'd like to have thread safety which will take a bit longer to commit. The changed states will be managed in a standard wine list, with pointers to the dirty elements from the stateblock(the state.changed field). This will allow us to work efficiently with the list:
* Add an element: Constant complexity * Remove an element: const * Check if an element exists: const(look at the stateblock) * Empty the list: const * apply the changed states from the list: linearly growing with the number of dirty states, max the number of existing states.
I I will chain empty elements in a 2nd list to avoid unneccessary HeapAlloc and HeapFree calls. Many d3d games spend 10% processing time in heap management already :-/
All states(render state, sampler, texture stage, bound textures, shaders, vertex type) will share the same list. The index of the changed state will identify the type of the state, e.g.
#define renderstate_entry(a) (a + 0) #define samplerstate_entry(a) (a + 1000) /* or renderstate_entry(max_render_state) #define texturestate_entry(a) (a + 2000) and so on
This will allow us to group different states, e.g. D3DRS_LIGHTINGENABLE with the vertex type.
As many d3d states affect the same opengl state(e.g. D3DRS_FOGVERTEXMODE, D3DRS_FOGTABLEMODE, D3DRS_FOGSTART, D3DRS_FOGEND), the states can be grouped together for efficient application. This also works accross different types.
When a state is marked dirty the Set*State function checked if the state gropup representative is already marked dirty, and if yes then the state isn't put on the list again. This keeps the list size <= the number of known states, and avoids applying the same state 2 times :-)
States will be applied in drawprim. BltOverride and UnlockRect change states on their own, and they can put the states the change onto the dirty list so drawprim will reset them to what the application wants. This avoids gratious setting and resetting. An extra last_was_blt field could be used to avoid that bltoverride sets its own states again and again.
Any comments on that?
Stefan
On 19/10/06, Stefan Dösinger stefandoesinger@gmx.at wrote:
I will move the functions applying the states into a new file, opengl_utils.c(name inspired by old ddraw). I don't want to put them into drawprim.c because that file is already quite long.
Not sure about the _utils part there, since it's pretty much core functionality, but fair enough.
vertex type) will share the same list. The index of the changed state will identify the type of the state, e.g.
#define renderstate_entry(a) (a + 0) #define samplerstate_entry(a) (a + 1000) /* or renderstate_entry(max_render_state) #define texturestate_entry(a) (a + 2000) and so on
This will allow us to group different states, e.g. D3DRS_LIGHTINGENABLE with the vertex type.
Can't you do that without putting everything in a single list?
As many d3d states affect the same opengl state(e.g. D3DRS_FOGVERTEXMODE, D3DRS_FOGTABLEMODE, D3DRS_FOGSTART, D3DRS_FOGEND), the states can be grouped together for efficient application. This also works accross different types.
I think typically that only goes for states from the same type, ie different render states that affect the same GL state or different texture stages that affect the same GL state, but not so much across state types. ie, it should be fairly uncommon for a render state to affect the same state as a texture stage state.
States will be applied in drawprim. BltOverride and UnlockRect change states on their own, and they can put the states the change onto the dirty list so drawprim will reset them to what the application wants. This avoids gratious setting and resetting. An extra last_was_blt field could be used to avoid that bltoverride sets its own states again and again.
Display lists could possibly help there as well, and might be easier / faster than integrating those functions into the state management. You would have to benchmark that to be sure though.
I assume you won't be changing everything over in one go, so perhaps it would be best to start with the texture stage states, since those are currently reapplied every drawprim call, and should be pretty easy to group.
One concern I've got about lumping all the state changes together right before the drawprim call is that we might lose some CPU/GPU parallelism. I guess we won't be able to see how that affects things until it's pretty much done either, but it's probably a good idea to run some proper benchmarks before sending any patches in. (It might actually be a good idea to do that in general).
Not sure about the _utils part there, since it's pretty much core functionality, but fair enough.
well, we can use other names too :-) opengl_state.c, opengl_AllOtherStuff.c
vertex type) will share the same list. The index of the changed state will identify the type of the state, e.g.
#define renderstate_entry(a) (a + 0) #define samplerstate_entry(a) (a + 1000) /* or renderstate_entry(max_render_state) #define texturestate_entry(a) (a + 2000) and so on
This will allow us to group different states, e.g. D3DRS_LIGHTINGENABLE with the vertex type.
Can't you do that without putting everything in a single list?
Basically yes, but is there a problem with a single list? I think it makes the code simpler to have a single list.
As many d3d states affect the same opengl state(e.g. D3DRS_FOGVERTEXMODE, D3DRS_FOGTABLEMODE, D3DRS_FOGSTART, D3DRS_FOGEND), the states can be grouped together for efficient application. This also works accross different types.
I think typically that only goes for states from the same type, ie different render states that affect the same GL state or different texture stages that affect the same GL state, but not so much across state types. ie, it should be fairly uncommon for a render state to affect the same state as a texture stage state.
Yes for texture state and render state(*), but for example the vertex type affects lighting and fog, which is also affected by render states.
(*) In earlier versions texture stage states used to be render states(D3DRENDERSTATE_MINFILTER, D3DRENDERSTATE_MAGFILTER, D3DRENDERSTATE_TEXTUREMAPBLEND, ...). But those states are forwarded to texture stage states / sampler states in ddraw, so we don't see them in wined3d.
States will be applied in drawprim. BltOverride and UnlockRect change states on their own, and they can put the states the change onto the dirty list so drawprim will reset them to what the application wants. This avoids gratious setting and resetting. An extra last_was_blt field could be used to avoid that bltoverride sets its own states again and again.
Display lists could possibly help there as well, and might be easier / faster than integrating those functions into the state management. You would have to benchmark that to be sure though.
My idea is to write the state applying code to be able to record setting a number of states into a display list, for easier use in stateblocks. When creating a stateblock we could do that
glRecordList(stateblockImpl->glList); /* or whatever the function is called */ for(i = 0; i < all_known_states; i++) set_state(i) glEndList(stateblockImpl->glList);
StateBlock::Apply can use the list to set the states glCallList(This->list); clear_dirty_state_list();
I assume you won't be changing everything over in one go, so perhaps it would be best to start with the texture stage states, since those are currently reapplied every drawprim call, and should be pretty easy to group.
Yup. Although I will start with render states, because breaking up the texture stage state + sampler state + active texture compound will be a bit tricky.
One concern I've got about lumping all the state changes together right before the drawprim call is that we might lose some CPU/GPU parallelism. I guess we won't be able to see how that affects things until it's pretty much done either, but it's probably a good idea to run some proper benchmarks before sending any patches in. (It might actually be a good idea to do that in general).
Ack.
Although I think regarding parallelism it can't be worse than it is at the moment :-)
Well, the number of sampler stages (and thus the number of sampler stage states) for example is dependant on hardware limits, so you can't really use the "#define STATE_SAMPLER(b, a) STATE_RENDER(WINEHIGHEST_RENDER_STATE + WINEHIGHEST_SAMPLER_STATE * b + a)" macro since WINEHIGHEST_SAMPLER_STATE would be non-constant.
Am Freitag 20 Oktober 2006 00:04 schrieb H. Verbeet:
Well, the number of sampler stages (and thus the number of sampler stage states) for example is dependant on hardware limits, so you can't really use the "#define STATE_SAMPLER(b, a) STATE_RENDER(WINEHIGHEST_RENDER_STATE + WINEHIGHEST_SAMPLER_STATE * b + a)" macro since WINEHIGHEST_SAMPLER_STATE would be non-constant.
include/wine/wined3d_types.h
#define WINED3D_HIGHEST_SAMPLER_STATE WINED3DSAMP_DMAPOFFSET WINED3DSAMP_DMAPOFFSET = 13,
and dlls/wined3d/wined3d_private.h #define MAX_SAMPLERS 16
On 20/10/06, Stefan Dösinger stefandoesinger@gmx.at wrote:
include/wine/wined3d_types.h
#define WINED3D_HIGHEST_SAMPLER_STATE WINED3DSAMP_DMAPOFFSET WINED3DSAMP_DMAPOFFSET = 13,
Those are per stage.
and dlls/wined3d/wined3d_private.h #define MAX_SAMPLERS 16
That's the maximum d3d9 supports, not necessarily what the hardware can do. Unsupported samplers would be wasted. Worse, d3d10 will support a lot more than 16 samplers.
Hi
and dlls/wined3d/wined3d_private.h #define MAX_SAMPLERS 16
That's the maximum d3d9 supports, not necessarily what the hardware can do. Unsupported samplers would be wasted. Worse, d3d10 will support a lot more than 16 samplers.
Right, we would waste 13*8 = 104 bytes per unsupported sampler. How much samplers does d3d10 support?
If we use run-time dynamic values for the state table then we loose the ability to access a state with its state number, and we loose the ability to set up a constant table in the code. We'd have to allocate it at device creation, fill it there and keep a table per device. I prefer to waste 10 or 20 kb for unsupported textures and samplers for the ability to access the table with the state number as index and to have it declared constant. If the memory is needed the operating system can just kick it from the memory in this case, otherwise it has to be paged out.
On 20/10/06, Stefan Dösinger stefandoesinger@gmx.at wrote:
Right, we would waste 13*8 = 104 bytes per unsupported sampler. How much samplers does d3d10 support?
Something like 128.
If we use run-time dynamic values for the state table then we loose the ability to access a state with its state number, and we loose the ability to set up a constant table in the code.
Well, no, you just wouldn't dump everything in one big list. Sure, lumping everything together in one big list would work, but that doesn't mean it's pretty.
Am Freitag 20 Oktober 2006 11:15 schrieb H. Verbeet:
On 20/10/06, Stefan Dösinger stefandoesinger@gmx.at wrote:
Right, we would waste 13*8 = 104 bytes per unsupported sampler. How much samplers does d3d10 support?
Something like 128.
128 is big, agreed.
If we use run-time dynamic values for the state table then we loose the ability to access a state with its state number, and we loose the ability to set up a constant table in the code.
Well, no, you just wouldn't dump everything in one big list. Sure, lumping everything together in one big list would work, but that doesn't mean it's pretty.
Hmm, the sampler states are per sampler in d3d, and per texture object in gl as far as I know. So we have to find some way if the sampler states are changed regarding the texture used for drawing. This is getting tricky :-/
Texture stage states are per stage in d3d and opengl afaik(except of those that are wrapped to samplers in ddraw/d3d8). Any idea how much texture stages d3d10 supports, if the concept of them still exists?
On 20/10/06, Stefan Dösinger stefandoesinger@gmx.at wrote:
Hmm, the sampler states are per sampler in d3d, and per texture object in gl as far as I know. So we have to find some way if the sampler states are changed regarding the texture used for drawing. This is getting tricky :-/
Yes, sRGB support has a problem with that as well.
Texture stage states are per stage in d3d and opengl afaik(except of those that are wrapped to samplers in ddraw/d3d8). Any idea how much texture stages d3d10 supports, if the concept of them still exists?
I don't know for sure, but I doubt it. Texture stages are pretty much a fixed function thing.
Am Freitag 20 Oktober 2006 12:00 schrieb H. Verbeet:
On 20/10/06, Stefan Dösinger stefandoesinger@gmx.at wrote:
Hmm, the sampler states are per sampler in d3d, and per texture object in gl as far as I know. So we have to find some way if the sampler states are changed regarding the texture used for drawing. This is getting tricky :-/
Yes, sRGB support has a problem with that as well.
Things like sRGB where the concept of activating it is entirely different between d3d and opengl won't fit into the state management anyway. We will have to take care for sRGB in SetTexture, SetSamplerState and maybe ApplyStateBlock(depending on what Ivan's tests show).
Looks like we have to deal with sampler states seperately, the way they are set differs too much between d3d and gl.
Texture stage states are per stage in d3d and opengl afaik(except of those that are wrapped to samplers in ddraw/d3d8). Any idea how much texture stages d3d10 supports, if the concept of them still exists?
I don't know for sure, but I doubt it. Texture stages are pretty much a fixed function thing.
Yup. But you never know ms, some texture stage states affect shaders too. Time to check the dx10 sdk.
Yes, sRGB support has a problem with that as well.
Things like sRGB where the concept of activating it is entirely different between d3d and opengl won't fit into the state management anyway. We will have to take care for sRGB in SetTexture, SetSamplerState and maybe ApplyStateBlock(depending on what Ivan's tests show).
Ah yeah, and simmilar things for states that only affect shader compilation, like sRGBWRITEENABLE
and we loose the ability to set up a constant table in the code.
The constant table is usually a bad idea, and this demonstrates why - the texture format table is another one where somehow we've been able to get away with staying constant, but I am sure issues will show up in the future involving extensions and conditional support for formats (or conflicting support from two extensions ?), which will prove that table to be insufficient as well. When I considered changing this table on AJ's request (before your patch), I started writing it as a big switch statement for that reason [ not to say I like big switch statements, but it seemed more flexible that way ].
Constant is convenient, but if it can't meet all necessary requirements, I wouldn't hesitate to drop the idea - never compromise on design in favor of C optimizations. Tomorrow's hardware will make any non-algorithmic optimizations irrelevant.
On 10/22/06, Ivan Gyurdiev ivg231@gmail.com wrote:
Constant is convenient, but if it can't meet all necessary requirements, I wouldn't hesitate to drop the idea - never compromise on design in favor of C optimizations. Tomorrow's hardware will make any non-algorithmic optimizations irrelevant.
While this is true for most things, it shouldn't be applied in all cases. For things like graphics processing, I would say every bit of optimization is worth it, even at the expense of a little design flexibility.
Keep in mind that having everyone in the world constantly upgrading their hardware because of attitudes like this is not sustainable -- a far better future would be where a standard computer is cheaper, needs less power, produces less noise and heat, and just does its job.
n0dalus.
n0dalus wrote:
On 10/22/06, Ivan Gyurdiev ivg231@gmail.com wrote:
Constant is convenient, but if it can't meet all necessary requirements, I wouldn't hesitate to drop the idea - never compromise on design in favor of C optimizations. Tomorrow's hardware will make any non-algorithmic optimizations irrelevant.
While this is true for most things, it shouldn't be applied in all cases. For things like graphics processing, I would say every bit of optimization is worth it, even at the expense of a little design flexibility.
Bah.. excuses for bad design. Constant-time access is important, but you need to index on the right thing - see other mail.
Keep in mind that having everyone in the world constantly upgrading their hardware because of attitudes like this is not sustainable --
Sure it is, my computer at work disagrees w/ you.
a far better future would be where a standard computer is cheaper, needs less power, produces less noise and heat, and just does its job.
Why? Just like you upgrade software to get new features and solve problems, you should upgrade hardware for the same purpose. Some problems are best solved in the hardware, rather than wasting programmers' time. What is it with software developers and old computers ?
Am Sonntag 22 Oktober 2006 10:31 schrieben Sie:
Warning: Lots of rant following, for the specific answers on ivg2's concerns see the end of the mail
n0dalus wrote:
On 10/22/06, Ivan Gyurdiev ivg231@gmail.com wrote:
Constant is convenient, but if it can't meet all necessary requirements, I wouldn't hesitate to drop the idea - never compromise on design in favor of C optimizations. Tomorrow's hardware will make any non-algorithmic optimizations irrelevant.
While this is true for most things, it shouldn't be applied in all cases. For things like graphics processing, I would say every bit of optimization is worth it, even at the expense of a little design flexibility.
Bah.. excuses for bad design. Constant-time access is important, but you need to index on the right thing - see other mail.
Which leads to stuff like java or .net where you need ~100MB of runtime to run a hello world app. Bigger java / .net apps run like bloat even on shiny new hardware(my personal impression, I have no statistics)
Keep in mind that having everyone in the world constantly upgrading their hardware because of attitudes like this is not sustainable --
Sure it is, my computer at work disagrees w/ you.
Well, the thing is that I want yesterday's gams to run on yesterday's hardware. With that upgrading hardware to improve performance is *not* an option.
Linux does a good job in running properly even on old hardware - I have an up-to-date Gentoo setup on my old notebook(120 mhz, 16 mb ram) set up as a router, and it runs as well as the old suse 6.4 I used to have on that notebook. I think we shouldn't waste that potential :-)
Well, my aims regarding performance are basically to be able to run the games equally fast as on windows on the same hardware, and to run games on the hardware that fullfills the minimal requirements of the games. Some more specific targets:
Run Half-Life 2 at >60 fps in dxlevel 90 with all stuff enabled on my gf7600 amd64 dual core box and on an intel mac(core due ~2 ghz, radeon X1600)*
Run Half-Life 2 playable on my notebook(1.6 ghz pentium m, radeon M9) (**)
Get 14000 3dmarks on my notebook in 3dmark2000
Run older stuff(Tomb raider 3, Moto racer 2, Empire Earth, Diablo 2) on my brothers notebook(700 mhz, 128 mb ram, ati mach64 :-D (***) )
* Might be impossible because macos X is bloat and just runs that nice because apple uses superiour hardware
** needs better ati drivers :-o
*** easy because the drivers are way better than the Windows ones.
a far better future would be where a standard computer is cheaper, needs less power, produces less noise and heat, and just does its job.
Why? Just like you upgrade software to get new features and solve problems, you should upgrade hardware for the same purpose. Some problems are best solved in the hardware, rather than wasting programmers' time. What is it with software developers and old computers ?
Well, as we all know the upgrading thing is enforced in the wintel way via file formats and so on. The usual users use office 2003 for the same thing they used office 95, but try to do anything proper with office95 these days ;-)
I dislike that attitude, so I try to make newer versions of my own code to run as fast or faster than older versions on the same hardware with the same features. In the comercial world upgrading hardware is prefered over writing fast code because hardware upgrades are cheaper and easier as long as it stays in acceptable bounds. I have seen comercial software developments in a company(I'm not talking about cw here), and I know open source development since 3 years, and the open source model with little to no financial pressures offers a good way to develop proper software instead of upgrading hardware :-)
--------rant end-----------
Ivan, your concerns about the flexibility of a static table are right. The const pixel format table has hit limits with CheckDeviceFormat and sRGB textures. The solution for CheckDeviceFormat was to take it out of the table, and for sRGB Phil Costin decieded to use a 2nd table for sRGB formats. However, I do not think that more dynamic things like a switch-case statement or a non-const table would change anything. CheckDeviceFormat would still stay as it is, and for sRGB we would need an equal switch like with the const table.
Regarding the sampler states, I claim that the biggest problem wouldn't be solved with a changeable state table or code handling it. The problem is that sampler settings are per texture in gl, and per device in d3d(except I am completely missinformed). The biggest problem of that is keeping track of changes. Render states and other things like matrices, shaders, viewport, ..., do not have that problem.
What the current code basically does is: -> bind texture -> apply sampler states
a first improvement suggestion would be if texture different, bind new texture check each sampler against the old values for the texture, if different appy the new state
I am afraid that there is not much improvement possible over that.
The approach with my constant table would be to group sampler states for sampler X with the texture bound to sampler X. Which would basically mean "if sampler states or texture changed, reapply(or verify) sampler and texture. What is also possible is not to group them and to have the function handling the bound textures check cause an extra check for the sampler. "if the texture is changed apply the texture and apply/verify the sampler states", and "if the sampler states are changed reapply them(and do not care for the texture"
Another difference between samplers and texture stage states, render states, ..., is that samplers are still effective with shaders, while the rest is a part of the fixed pipeline. Thus samplers can be subject to change in dx10 and future versions while the rest is pretty much in its final form(except ms decides to reintroduce the fixed pipeline in dx11)
That said, I do not claim that my constant table is the ultimately best solution. I have a in principle nop patch for render states which moves them to a different file and replaces the switch statement with the table but does no dirtifying yet. I will add shaders, matrices, ..., to it and see how it works out before sending patches. Samplers will not go into that table, I think we shouldn't try to mix apples and pears by force. I rather think about a dirty list per supported sampler(including the bound texture) in addition to the dirty state list. SetTexture and SetSamplerState for a certain sampler will dirtify the pixel shader state to verify the source samplers used in the shader.
Anyone who has other suggestions is free to concretize them. Henri's trees turned out to be basically the same as my idea, except that Henri prefered a list for the dirty list while I originally planned an array. We decided to go for a list because of more efficient memory usage. (Or is there anything else I missed? Lionel taked about trees too, but I don't know what exactly his idea was)
Am Sonntag 22 Oktober 2006 10:31 schrieb Ivan Gyurdiev:
I thought that my first reply was a bit off-topic, I think I should describe my idea with the table a bit better :-)
Each entry in the state table has 2 fields. They do not strictly depend on each other:
struct StateEntry { DWORD representative; APPLYSTATEFUNC func; };
The representative is a way to group states which depend on each other. For example WINED3DRS_ALPHATESTENABLED, WINED3DRS_ALPHAREF and WINED3DRS_COLORKEYENABLE go into the same gl state, so they are grouped. The reason for that is in the dirty list. SetRenderState pushes an element to the list instead of device->addDirtyState(This, State); /* Later per context */ it will do this: device->addDirtyState(This, States[State]->representative)
This way the code applying the states has to be called only once.
The entry func contains a pointer to the function handling setting the state. This is still a bunch of code, not a magic instruction how to call gl directly. To apply a state to gl
States[State]->func(State, stateblock);
is called. Basically this is equal to the current SetRenderState code:
switch(state) { case WINED3DRS_LIGHTING: <do something> break; case WINED3DRS_SOMETHINGELSE: <...> }
While I think that in that case inline functions containing the <do something> block look nicer than the current SetRenderState.
I think you agree that the state grouping is a bit inflexible. For example, the vertex declaration affects the fog settings. This way it would be consequent to group them. However, as other things depend on the vertex decl this will get a way to big group. The bigger the groups are the more likely it is that one of the states is changed, so this will eat performance.
While the vertex decl can change the fog, the fog settings will never affect the vertex declaration, so we have a sort of one-way dependency(compared to the alpha example). To overcome this we can still use some code in the function. For example state_fog(state, stateblock) can read the vertex declaration to decide what to do. In the same way it is perfectly fine for misc_vdecl(state, stateblock) to call state_fog() either unconditionally or when it thinks that fog needs to be verified. By doing that dirtifying the fog does not require all vertex data pointers to be updated, while the fog is still updated when the vertex decl is changed but the fog settings aren't.
One might argue that calling a foreign state function and reading a foreign state is not a nice design. Ok - But a switch statement(or bunch of if-blocks) wouldn't be better off. If you look at the current code, the fog things are scattered all over SetRenderState and drawprim.
Then there are other sorts of states, like sampler states and texture stage states. Texture stage states can go into the global state list just fine. One issue is that different stages depend on each other, due to D3DOP_DISABLE. To handle that properly we need to track the highest enabled texture stage and the lowest disabled texture stage per context, and some code in drawprim which cares for the stages in between(will be 0 in general, so just an if check).
Sampler states and bound textures are more difficult, as they are per-texture settings in gl and per device settings in d3d. There is no point in putting them into the main state list, instead there should be a dirty list per supported sampler. This list will contain the bound texture and the sampler states, and use a simmilar dirtification mechanism as the rest of the states. There is a one-way dependency too: A change of the texture may require a reapplication of the sampler states, but a change of a sampler state will not require a change of the texture.
A simmilar list will group sampler states together, and the function binding the texture will check the sampler states last used with the texture against the states in the stateblock, and if they are different call the sampler state specific functions to reapply them. The only way around this would be a per-texture dirty sampler state list which will make SetSamplerState way too expensive...
I hope that explained the whole plan a bit better than my last mail :-)
Stefan
device->addDirtyState(This, States[State]->representative)
This way the code applying the states has to be called only once.
What does addDirtyState() do when called multiple times with the same representative?
This is still a bunch of code, not a magic instruction how to call gl directly. To apply a state to gl
States[State]->func(State, stateblock);
How would I mark vertex shader constant #3653 dirty, and apply it using your mechanism ?
I think you agree that the state grouping is a bit inflexible. For example, the vertex declaration affects the fog settings. This way it would be consequent to group them. However, as other things depend on the vertex decl this will get a way to big group. The bigger the groups are the more likely it is that one of the states is changed, so this will eat performance.
I think you're mixing dirtification of states with mapping which function applies those states, and I don't fully understand how this is going to work yet...
Sampler states and bound textures are more difficult, as they are per-texture settings in gl and per device settings in d3d.
That doesn't sound entirely correct - there are per-texture-unit settings in GL, and per-sampler settings in d3d. I thought we had to map the samplers to GL texture units.
There is no point in putting them into the main state list, instead
there should be a dirty list per supported sampler.
Again, it seems to me that two different concepts are being mixed together - I certainly don't want to be typing the function to apply per sampler, when there should be one entry in a table somewhere, with the sampler passed as an argument. Dirtification is another topic, which I don't think you can solve with a list anyway.
I hope that explained the whole plan a bit better than my last mail :-)
Sure, but I'm still confused :)
Am Montag 23 Oktober 2006 00:57 schrieben Sie:
What does addDirtyState() do when called multiple times with the same representative?
There is a stateblock->changed.<somestate> field in the stateblock, which is a boolean right now. This can be made a pointer to the list element, and set to the element when the state is first add to the dirty list, and set to NULL when the state is applied. When addDirtyState finds that a state already has a list entry it doesn't have to dirtyfy it again
This is still a bunch of code, not a magic instruction how to call gl directly. To apply a state to gl
States[State]->func(State, stateblock);
How would I mark vertex shader constant #3653 dirty, and apply it using your mechanism ?
Uhh, shader constants, must have forgotten them :-( . Well, Henri and I silently agreed to leave them as they are right now. I think they have a simmilar issue as the sampler states. I have to check how and when to trigger an upload of the shader constants.
I think you agree that the state grouping is a bit inflexible. For example, the vertex declaration affects the fog settings. This way it would be consequent to group them. However, as other things depend on the vertex decl this will get a way to big group. The bigger the groups are the more likely it is that one of the states is changed, so this will eat performance.
I think you're mixing dirtification of states with mapping which function applies those states, and I don't fully understand how this is going to work yet...
Well, what I wanted to say with this is that the representative grouping is not the last thing we have, and that we can implement more fine grained control in the code where needed.
Sampler states and bound textures are more difficult, as they are per-texture settings in gl and per device settings in d3d.
That doesn't sound entirely correct - there are per-texture-unit settings in GL, and per-sampler settings in d3d. I thought we had to map the samplers to GL texture units.
I mean things set with glTexParameter. The man page is not really clear on that, but I think the red book mentiones that texture parameters are per texture object, and not per stage. I'll check that again.
There is no point in putting them into the main state list, instead
there should be a dirty list per supported sampler.
Again, it seems to me that two different concepts are being mixed together - I certainly don't want to be typing the function to apply per sampler, when there should be one entry in a table somewhere, with the sampler passed as an argument. Dirtification is another topic, which I don't think you can solve with a list anyway.
No worries, we don't need a seperate function for each sampler :-) The apply function gets a DWORD state value passed which it can use to find out to which sampler to apply things and where to read the values from.
Dirtification won't work properly for samplers, right. We will have to check the 13 sampler states when a different texture is used.
Stefan Dösinger wrote:
Am Montag 23 Oktober 2006 00:57 schrieben Sie:
What does addDirtyState() do when called multiple times with the same representative?
There is a stateblock->changed.<somestate> field in the stateblock, which is a boolean right now. This can be made a pointer to the list element, and set to the element when the state is first add to the dirty list, and set to NULL when the state is applied. When addDirtyState finds that a state already has a list entry it doesn't have to dirtyfy it again
Ok, but that sounds rather messy...
This is still a bunch of code, not a magic instruction how to call gl directly. To apply a state to gl
States[State]->func(State, stateblock);
How would I mark vertex shader constant #3653 dirty, and apply it using your mechanism ?
Uhh, shader constants, must have forgotten them :-( . Well, Henri and I silently agreed to leave them as they are right now.
I don't like how the number of things staying "as they are right now" is growing, while the number of things being changed remains confined to render states. To have a proof-of-concept state management system, it would be best to take things that are as different as possible, and manage to get them successfully updated via the new state manager. Otherwise you won't find out whether the design is flawed or not until much later.
There is no point in putting them into the main state list, instead
there should be a dirty list per supported sampler.
Again, it seems to me that two different concepts are being mixed together - I certainly don't want to be typing the function to apply per sampler, when there should be one entry in a table somewhere, with the sampler passed as an argument. Dirtification is another topic, which I don't think you can solve with a list anyway.
No worries, we don't need a seperate function for each sampler :-) The apply function gets a DWORD state value passed which it can use to find out to which sampler to apply things and where to read the values from.
Yes, but it sounds like you need an entry for this function into the map for each sampler... map { SAMPLER(0) -> sampler_apply_function } map { SAMPLER(1) -> sampler_apply_function } map { SAMPLER(2) ...
...and that seems wrong, but maybe I'm misunderstanding.
On 23/10/06, Ivan Gyurdiev ivg231@gmail.com wrote:
Uhh, shader constants, must have forgotten them :-( . Well, Henri and I silently agreed to leave them as they are right now.
I don't like how the number of things staying "as they are right now" is growing, while the number of things being changed remains confined to render states. To have a proof-of-concept state management system, it would be best to take things that are as different as possible, and manage to get them successfully updated via the new state manager. Otherwise you won't find out whether the design is flawed or not until much later.
Well, yes. Shader constants might not be the best example for that (the "as different as possible" part, that is) though, for reasons mentioned in my mail above. Shader constants do illustrate my point about not putting everything in the same list though, possibly better than samplers do. The number of supported uniforms varies quite a bit between cards, more so than the 128 samplers for d3d10 mentioned further up above.
Am Montag 23 Oktober 2006 08:43 schrieben Sie:
Stefan Dösinger wrote:
Am Montag 23 Oktober 2006 00:57 schrieben Sie:
What does addDirtyState() do when called multiple times with the same representative?
There is a stateblock->changed.<somestate> field in the stateblock, which is a boolean right now. This can be made a pointer to the list element, and set to the element when the state is first add to the dirty list, and set to NULL when the state is applied. When addDirtyState finds that a state already has a list entry it doesn't have to dirtyfy it again
Ok, but that sounds rather messy...
Well, we don't need a pointer to the list element, a BOOL changed will still do the job. All we need then is a "if(renderstate[representative].changed) return;"
I don't like how the number of things staying "as they are right now" is growing, while the number of things being changed remains confined to render states. To have a proof-of-concept state management system, it would be best to take things that are as different as possible, and manage to get them successfully updated via the new state manager. Otherwise you won't find out whether the design is flawed or not until much later.
Agreed, in theory. But I also think that we shouldn't try to put different things in one model by force. If some things work differently, why shouldn't we handle them seperately?
Shader constants are different from fixed function things in two ways: There is a dynamic number of them, and there are no dependencies between 2 shader constants. So there is no use for a table like the fixed function elements, which on the other hand does not mean that a table and dirty list is pointless for render states and other fixed function things.
That I said "leave them as they are" was because Henri claimed that constant management is done properly already. I don't know the code, and if it needs improvement I don't say we can't change it :-) . But if there are different needs than fixed function states we don't have to use the same data structure / code either :-)
Yes, but it sounds like you need an entry for this function into the map for each sampler... map { SAMPLER(0) -> sampler_apply_function } map { SAMPLER(1) -> sampler_apply_function } map { SAMPLER(2) ...
...and that seems wrong, but maybe I'm misunderstanding.
Nah, I wouldn't create a table with own entries for up to MAX_SAMPLERS*HIGHEST_SAMPLER_STATE sampler states :-) So for samplers I could create one table containing the 13 sampler states for grouping them, but as dirtifying won't work well for samplers there is not much use for that. The idea would be to group e.g. MINFILTER and MIPFILTER, and keep a dirty list per supported sampler. But a SetSamplerState without a SetTexture on the same sampler is rather rare I think, so the gain would be limited. The 'global' dirty list however can contain a flag that one or more samplers are dirty. The handler for this dirty flag can contain code to check which sampler states have to be reapplied.
I don't like how the number of things staying "as they are right now" is growing, while the number of things being changed remains confined to render states. To have a proof-of-concept state management system, it would be best to take things that are as different as possible, and manage to get them successfully updated via the new state manager. Otherwise you won't find out whether the design is flawed or not until much later.
Agreed, in theory. But I also think that we shouldn't try to put different things in one model by force. If some things work differently, why shouldn't we handle them seperately?
Because that creates confusion and disorder - the idea is to move from chaos (gl code in drawprim, shaders, device.c) to order (gl code in one place, following a uniform pattern, with the ability to do locking and/or switch the gl backend).
Shader constants are different from fixed function things in two ways: There is a dynamic number of them,
That's only a problem, because you keep trying to force everything into a constant array :)
and there are no dependencies between 2 shader constants. So there is no use for a table like the fixed function elements,
Sure there is... it creates a uniform structure for applying states, bringing some benefits as described above. Dirty-tracking is only one of the goals, and the stateblock already does some of that..
Because that creates confusion and disorder - the idea is to move from chaos (gl code in drawprim, shaders, device.c) to order (gl code in one place, following a uniform pattern, with the ability to do locking and/or switch the gl backend).
Well, there is not much difference between samplers, shaders and render states, the idea is the same in spirit, just the code is a bit different. The whole sampler settings, and the shader constant can be handled as one state in the table, and updates can be caused in the same way. So the function verifying and setting sampler states will be a normal function like a function for a render state, just a bit bigger, and the whole sampler state block will likely be marked dirty every frame. There is no additional magic involved in activating updating the samplers as a whole. Simmilar for shader constants.
e.g. Set{Vertex/pixel}ShaderConstant* markes STATE_SHADERCONSTANT dirty, which is handled by func_shaderconstant(state, stateblock). SetPixelShader and SetVertexShader dirtify STATE_PSHADER/STATE_VSHADER, and the func_pshader / func_vshader call func_shaderconstant if we use glsl shaders to be sure that the shader constants are correct. (The details may look slightly different, but that is the idea basically)
Shader constants are different from fixed function things in two ways: There is a dynamic number of them,
That's only a problem, because you keep trying to force everything into a constant array :)
and there are no dependencies between 2 shader constants. So there is no use for a table like the fixed function elements,
Sure there is... it creates a uniform structure for applying states, bringing some benefits as described above. Dirty-tracking is only one of the goals, and the stateblock already does some of that..
Can you describe what this would look like in detail?
Can you describe what this would look like in detail
I haven't thought it through in detail - just making sure you have. I think you're going in the right direction, as long as considering things other than render states in the design - won't bother you anymore, until I see some more of the code :)
---
I've tracked down why wine deadlocks on my computer (miscompiled libX11 on Fedora Rawhide), so hopefully I can finish my texture/sampler testcases now, and vtf-related things [ and namespace cleanups ]. Actually VTF depend on what you're doing, and so do FBOs - they both require delayed application of certain states at draw time.
On 23/10/06, Stefan Dösinger stefandoesinger@gmx.at wrote:
How would I mark vertex shader constant #3653 dirty, and apply it using your mechanism ?
Uhh, shader constants, must have forgotten them :-( . Well, Henri and I silently agreed to leave them as they are right now. I think they have a simmilar issue as the sampler states. I have to check how and when to trigger an upload of the shader constants.
Right now, it sets the corresponding flag flag in the "set" structure of the stateblock to mark it dirty, and if it wasn't already it adds the constant's index to a list with dirty constants to apply. When it's time to apply constants we just iterate of the list. In that respect shader constants already work similar to how the new statemanagement would work.
One issue with shader constants is that in GLSL you set uniforms per program object, and it keeps the values last set, also when changing the currently used program. That means that in certain cases we might be setting constants to the same values as they were the last time we used the program.
Ie, if the program never changed, we should remove shader constants from the dirty list after they are applied. If a program was used that was never used with this device/stateblock before we should set all the dirty constants again. But if a program was used that *was* used before, we would have to figure out what constants changed between now and the last time the program was used.
I'm not sure how much we would gain from fixing that though, depending on what a uniform is used for it might either change each frame or stay the same for a relatively long time.
Sampler states and bound textures are more difficult, as they are per-texture settings in gl and per device settings in d3d.
That doesn't sound entirely correct - there are per-texture-unit settings in GL, and per-sampler settings in d3d. I thought we had to map the samplers to GL texture units.
I mean things set with glTexParameter. The man page is not really clear on that, but I think the red book mentiones that texture parameters are per texture object, and not per stage. I'll check that again.
They are per texture object. You did say per device rather than per sampler in the d3d case though. We do have to map texture units to samplers, but that's for texture stages, when register combiners are used.
Wrt sampler states <-> texture parameters: note that it's not unlikely for a texture to always be used with eg. the same filter settings, while the sampler states might change. It might be worth it to keep track of what sampler states a texture (ie, IWineD3DTextureImpl) was last used (ie, applied to the GL texture object) with.
Hi,
One issue with shader constants is that in GLSL you set uniforms per program object, and it keeps the values last set, also when changing the currently used program. That means that in certain cases we might be setting constants to the same values as they were the last time we used the program.
Same issue to the sampler states and texture objects. I am afraid the fastest thing to deal with that is something like if(program->c[X] != stateblock->c[X]) { <set the gl constant; program->c[X] = stateblock->c[X]; }
One way is to keep a dirty list per shader program, but that would be an O(# programs), which is worse than the code above.
Wrt sampler states <-> texture parameters: note that it's not unlikely for a texture to always be used with eg. the same filter settings, while the sampler states might change. It might be worth it to keep track of what sampler states a texture (ie, IWineD3DTextureImpl) was last used (ie, applied to the GL texture object) with.
Yup :-) The texture impl already stores the values, not sure what those are used for.
Lionel Ulmer voiced the idea of making comparisons a bit more efficient by e.g. putting some values in one DWORD. For example, the filter type needs only 3 bits essentially, which allows us to put MIN-, MAG- and MIPFILTER into one dword and just compare that. We can do that, or don't do it, but I don't think it will gain much.
All we can do is to manage as much states as possible with dirtification, and for the rest use verification before applying for the rest. And then we can make dirtification and verification as efficient as possible.
On 10/22/06, Ivan Gyurdiev ivg231@gmail.com wrote:
and we loose the ability to set up a constant table in the code.
The constant table is usually a bad idea, and this demonstrates why -
I'm in favor of flexibility, but is it correct to assume that the number of functions that won't work very well with the table is small ? In that case we could put some special number in the table that indicates something like "look in the switch statement instead" for those functions. This way we keep the flexibility, keep the switch statement small and keep the standard functions optimized.
Jaap Stolk wrote:
On 10/22/06, Ivan Gyurdiev ivg231@gmail.com wrote:
and we loose the ability to set up a constant table in the code.
The constant table is usually a bad idea, and this demonstrates why -
I'm in favor of flexibility, but is it correct to assume that the number of functions that won't work very well with the table is small ? In that case we could put some special number in the table that indicates something like "look in the switch statement instead" for those functions. This way we keep the flexibility, keep the switch statement small and keep the standard functions optimized.
I shouldn't have mentioned a switch statement, that was to illustrate an example.
My concern with this particular table is that at the moment it seems too narrow, targeted at render states [ and discussion shows adding sampler states isn't as straightforward as it seems. ]. I imagine a constant map from keys [ APPLY_SAMPLERS, APPLY_ZENABLE, ... ] to function pointers. I'm not quite sure why we're looking into adding pointers per sampler - doesn't really make sense to me. There's three things involved here: (1) how to store the D3D data, (2) how to store the GL data, and (3) how to apply the GL data. Discussion of anything "per-sampler" has to pertain to (1) or (2), but really has nothing to do with (3)...
For storing the D3D data, a different type of structure is necessary, I like Henri's tree ideas, maybe they can be used to enhance the current stateblock. That structure would be passed as an argument to the functions that the above table points to. A list of dirty states could specify which "apply" functions to execute, but you still need to pass in data to those functions, and the sampler number seems like it should be part of the data, not part of how things are indexed. To apply changes to sampler#6 you'd call the APPLY_SAMPLERS function, with a data arg that marks the 6th sampler dirty, and contains data for sampler 6.
States will be applied in drawprim. BltOverride and UnlockRect change states on their own, and they can put the states the change onto the dirty list so drawprim will reset them to what the application wants.
What about Clear()? Maybe you should also provide the capability of applying a single dirty state [ or a group of them ].
States will be applied in drawprim.
It will be good if *all* texture-related states were applied in drawprim, specifically. This is a prerequisite to VTF support, since that involves repacking pixel and vertex textures into a single array, and changing their indices [ should happen at drawprim, breaking any previously applied state on that texture ].
Am Donnerstag 19 Oktober 2006 13:35 schrieb Ivan Gyurdiev:
States will be applied in drawprim.
It will be good if *all* texture-related states were applied in drawprim, specifically. This is a prerequisite to VTF support, since that involves repacking pixel and vertex textures into a single array, and changing their indices [ should happen at drawprim, breaking any previously applied state on that texture ].
That is the plan ;-)
Am Donnerstag 19 Oktober 2006 12:08 schrieb Stefan Dösinger:
Hi, I considererd that it would be finally time to get started with the state management rewrite :-) Just a mail with the final plan, if anyone has comments.
Well, I started with the thing, and I want to show the first results so others can better see where we are headed :-)
I named the file state.c, no idea why. At the moment it only contains a state management table which is basically empty. The state management table contains a representative for each state(e.g. ALPHAFUNC is the representative for ALPHAFUNC and ALPHAREF, because both states affect the same gl state. Most states are their own representative for now, this will change when the stuff is implemented. Those will be used to group render states depending on each other
For each state there is a pointer to a function for applying the state. At the moment those pointers are NULL, but later this file will contain static functions for applying a specific state, referenced by the table. So a state can be applied by calling
States[STATE_RENDER(RenderState)]->func(IWineD3DStateBlockImpl *);
STATE_RENDER is defined as #define STATE_RENDER(a) (a)
for sampler states, ..., I will add a #define STATE_SAMPLER(b, a) STATE_RENDER(WINEHIGHEST_RENDER_STATE + WINEHIGHEST_SAMPLER_STATE * b + a) and so on. Simmilar, if needed a STATE_IS_SAMPLER, but I don't think we'll need that.
For the function that applies all the states in drawprim I was thinking about 2 ways: Using a pointer to the apply function in the table, or inline functions, aka
LIST_FOR_EACH(dirty_states_list) { switch(current_state_number) { case STATE_RENDER(D3DRS_LIGHTING) call_some_inline_function; break; } }
This would have avoided a full call+ret for each state that is applied, but this switch block would grow terribly long, ~500 entries with all states in one list, or the equivalent number in seperate switch blocks. I think a call is faster than checking against 500 constants, and it allows us to apply a single state with ease.
A for(i = 0; i < HIGHEST_STATE_ENTRY; i++) { if(States[i]->func) States[i]->func(Stateblock); } will allow us to record a full stateblock into an opengl display list :-)
Concernes: The state table will get pretty big, with some gaps. I want to allow finding a state by just going into the array with the state number without having to search for the state(yes, binary search is easilly possible, but still). We can stuff the gaps with other states if we really, really want to, and code for searching might take more memory than the gaps in the list.
The list can get hard to maintain, one missing entry and all pointers go wrong... But we have to write it only once, d3d10 is unlikely to add more stuff as the fixed function pipeline was kicked.
Grouping the states: Should work nice basically, but once concern about the vertex type, lighting, fog, vertex shaders: Fog and Lighting depends on the vertex type, and fog depends on wether a vertex shader is used. So D3DRS_LIGHTING, D3DRS_FOGSTART, D3DRS_FOGEND, D3DRS_FOGENABLE, D3DRS_FOGTABLEMODE, D3DRS_FOGVERTEXMODE, The vertex type, the bound vertex shader will be linked together. Pretty huge block... Any suggestions about breaking it up nicely?
More?
More?
What are your plans for dealing with these: ========================= SetLight() SetLightEnable() SetTexture() SetDepthStencilBuffer() SetRenderTarget() SetSomethingElseThatsNotARenderState().
Am Freitag 20 Oktober 2006 02:24 schrieben Sie:
More?
What are your plans for dealing with these:
SetLight() SetLightEnable()
Depens on wether lights are shared with share lists. If yes, leave them as they are, otherwise put them on the list
SetTexture()
Manage them on the dirty list, bound textures interact with texture stage states.
SetDepthStencilBuffer() SetRenderTarget()
Tricky. They will need context managing. The idea is to just change a few pointers in the setter functions and create the gl resources(fbos, pbuffers, contexts), and in drawprim search for the needed context, activate it and use that context's dirty states list
SetSomethingElseThatsNotARenderState().
SetTransform, SetViewport come to my mind. Put them onto the dirty state list too, they interact with the vertex type. (Some matrices do, some not, we can deal with each matrix seperately)