Am Freitag, 11. April 2008 00:31:13 schrieb Stefan Dösinger:
The drawback of this is that now the workaround for the mesa atifs bug minimally affects nvidia cards too. It's a bit tricky to design around driver bugs :-/
Use this patch instead, the last one had a small but pretty bad bug that broken GLSL
On 11/04/2008, Stefan Dösinger stefan@codeweavers.com wrote:
Am Freitag, 11. April 2008 00:31:13 schrieb Stefan Dösinger:
The drawback of this is that now the workaround for the mesa atifs bug minimally affects nvidia cards too. It's a bit tricky to design around driver bugs :-/
Use this patch instead, the last one had a small but pretty bad bug that broken GLSL
This patch doesn't make sense at all. Wasn't ATIFS supposed to be an implementation of fixed function fragment processing for ATI cards? In that case it doesn't make sense to only use it on cards that don't support ARB or GLSL. That will only make newer ATI cards have worse fixed function fragment processing (assuming ATIFS works as advertised). Of course what it comes down to again is that the shader backend isn't the right place to implement this. (Not to mention the issue of making a workaround for a Mesa bug part of the shader infrastructure... that bug should be fixed in Mesa.)
- void (*shader_fragment_enable)(IWineD3DDevice *iface, BOOL enable);
The name "shader_fragment_enable" is ambiguous at best within the context of a shader backend.
Am Freitag, 11. April 2008 08:58:05 schrieb H. Verbeet:
First of all: We really want that patch to go in, Wine is currently badly broken on fglrx.
This patch doesn't make sense at all. Wasn't ATIFS supposed to be an implementation of fixed function fragment processing for ATI cards? In that case it doesn't make sense to only use it on cards that don't support ARB or GLSL. That will only make newer ATI cards have worse fixed function fragment processing (assuming ATIFS works as advertised). Of course what it comes down to again is that the shader backend isn't the right place to implement this.
We need that in a shader backend if we are ever going to implement pixel shaders using atifs. We also have to make nvts/nvrc a shader backend if we ever implement pixel shaders using that extension.
As for the worse fragment processing on newer ATI cards: We do not want to use ATIFS on those cards anyway, because ATIFS has a hardcoded limitation of 6 simultanous textures, whereas radeon 9500+ cards support 8 textures with regular OpenGL fixed function processing. Also fglrx supports GL_ATI_envmap_bumpmap on those cards, so we have bump mapping without ATIFS as well(Still missing stuff like specular color input and rare blending parameters though). For newer ATI cards we need an ARBFP / GLSL pipeline replacement, no matter what we do with ATIFS.
Remember, my original patchset used ATIFS with newer cards as well. Ivan did not like the 3 inherited atifs backends, so I've limited ATIFS to its final use in a 'perfect' wine that has an ARBFP and GLSL replacement.
(Not to mention the issue of making a workaround for a Mesa bug part of the shader infrastructure... that bug should be fixed in Mesa.)
It is fixed there already, but we're using the workaround to allow users of older Mesa versions to run the tests without crashing. If it's just about me we can take the workaround out again, I have the newest Mesa code...
On 11/04/2008, Stefan Dösinger stefan@codeweavers.com wrote:
This patch doesn't make sense at all. Wasn't ATIFS supposed to be an implementation of fixed function fragment processing for ATI cards? In that case it doesn't make sense to only use it on cards that don't support ARB or GLSL. That will only make newer ATI cards have worse fixed function fragment processing (assuming ATIFS works as advertised). Of course what it comes down to again is that the shader backend isn't the right place to implement this.
We need that in a shader backend if we are ever going to implement pixel shaders using atifs. We also have to make nvts/nvrc a shader backend if we ever implement pixel shaders using that extension.
I'm not convinced we should lump that together in the shader backend just because it uses the same GL extension, but I guess we've been through that discussion. Fact is that we don't currently implement pixel shaders using atifs, and nobody knows how long it may take before we will, if ever.
As for the worse fragment processing on newer ATI cards: We do not want to use ATIFS on those cards anyway, because ATIFS has a hardcoded limitation of 6 simultanous textures, whereas radeon 9500+ cards support 8 textures with regular OpenGL fixed function processing. Also fglrx supports GL_ATI_envmap_bumpmap on those cards, so we have bump mapping without ATIFS as well(Still missing stuff like specular color input and rare blending parameters though). For newer ATI cards we need an ARBFP / GLSL pipeline replacement, no matter what we do with ATIFS.
I think in practice the specular color input is probably more important than the two extra textures. It's pretty rare for programs to hit the nvrc limit of 4, and if they do they don't usually break as bad as when they try to use the missing specular input.
First of all: We really want that patch to go in, Wine is currently badly broken on fglrx.
Yes it's broken, but I'm not convinced this is the right way to fix it. (I'd be fine with committing this just to fix the breakage, but I do think this is simply more evidence that the atifs stuff is in the wrong place.) Ultimately I'm just giving my opinion of course, it's up to you and Alexandre to decide what to do with that.
Am Freitag, 11. April 2008 10:39:13 schrieb H. Verbeet:
I think in practice the specular color input is probably more important than the two extra textures. It's pretty rare for programs to hit the nvrc limit of 4, and if they do they don't usually break as bad as when they try to use the missing specular input.
Not really, many games check the MaxSimultaneousTextures cap and use that value even if they are using pixel shaders. E.g. HL2 fails in dxlevel 81 with that with ARB shaders because it checks that cap if it finds SM 1.x, and Age of Empires 3 fails as well. Call of duty 4 is another popular example, it requires 8 fixed function textures. (COD4 wants TSSARGTEMP as well, so that point is moot)
On 11/04/2008, Stefan Dösinger stefan@codeweavers.com wrote:
Not really, many games check the MaxSimultaneousTextures cap and use that value even if they are using pixel shaders. E.g. HL2 fails in dxlevel 81 with that with ARB shaders because it checks that cap if it finds SM 1.x, and Age of Empires 3 fails as well. Call of duty 4 is another popular example, it requires 8 fixed function textures. (COD4 wants TSSARGTEMP as well, so that point is moot)
Sure, but they don't generally break if you just return 8 there. (Compared to eg. HL2/CSS in dxlevel70 without D3DTA_SPECULAR). Of course the proper way to get around all of the issues at the same time would be to write a real GLSL fixed function fragment processing implementation, but that doesn't change the fact that the current situation is pretty ugly.
Am Freitag, 11. April 2008 16:10:49 schrieb H. Verbeet:
Sure, but they don't generally break if you just return 8 there. (Compared to eg. HL2/CSS in dxlevel70 without D3DTA_SPECULAR). Of course the proper way to get around all of the issues at the same time would be to write a real GLSL fixed function fragment processing implementation, but that doesn't change the fact that the current situation is pretty ugly.
I don't think it's the D3DTA_SPECULAR that breaks things. HL2 doesn't use it. If it finds pixel shader support advertised it suddenly works without D3DTA_SPECULAR support and does not use pixel shaders either, so I think there's a different capability flag issue there(or HL2 is just broken on cards without pixel shader support nowadays).
Alexandre didn't commit the patch, I think we should come to an agreement on this issue, otherwise it is going to come up again and again.
The requirements I have in mind are these:
1) The state table should be selectable based on the available opengl features and possibly registry settings. I think we all agree on that
2) We want a fixed function vertex and fragment pipeline replacement with ARB and GLSL
3) We want to be able to support pixel shaders with atifs and nvts. I don't know if that will ever be implemented, but if we choose a design that makes this hard or impossible that's not going to help
4) Even if we have an ARB/GLSL replacement, it should be possible to use D3D shaders but still use the opengl fixed function pipeline
5) A nice to have is to be able to use the replacement pipelines together with shaders, but that is not a hard requirement for me. We need an ARB and GLSL replacement anyway.
---
I understand Henri's suggestion this way: Don't put the state table into the shader backend(*), but select it separately. This way the ATIFS pipeline replacement doesn't have to be a shader model. That way we can choose the ARB shader backend and ATIFS state table on r200 cards, and use GLSL and ATIFS on newer cards.
That way we get full shader support and still the advantages of the pipeline replacement without using 3 inherited shader backends. Then the state table could have some backend to enable / disable ATIFS like in this patch. (The bottom line of the patch here is that we should enable the extension we *use*, and not what happens to be available).
The state table would report the fixed function caps it has, and the shader backend reports the programmable pipeline caps. GetDeviceCaps has to collect the caps from multiple places, but that isn't an issue IMO.
Is that correct, or did I missunderstand something?
However, I see two issues with that:
*) In which place do we decide which program/shader to use in the end? If we have an ARB fixed function replacement program and an ARB program generated from a D3D shader? Currently this happens to work because ARB/GLSL override NVTS and NVRC, but this fails if we have a shader and ffp implementation using the same extension
*) What do we do if a shader implementation wants to overwrite some states for its programmable pipeline work? For example take my GLSL clipplane patch: The ARB shader backend disables clipplanes when a shader is used. This example is about a driver bug, but there are others as well: ATIFS and NVTS with a pixel shader implementation will have to override a part of the sampler setting(the stuff that is currently done in activate_dimensions)
How would you deal with this?
Stefan
(*): Ivan and I agreed on a discussion on IRC that "shader backend" is not a good name. The shader backends today are more a pipeline implementation rather than a shader implemenation, and should be renamed accordingly. I did not rename them yet because I am not 100% happy with the name "pipeline" yet.
On Friday 11 April 2008 07:18:49 am Stefan Dösinger wrote:
Alexandre didn't commit the patch, I think we should come to an agreement on this issue, otherwise it is going to come up again and again.
For my own 2 cents on the issue, I think the fixed-function replacements should work through the selected shader backend. For example, if someone has GLSL enabled, the fixed-function replacement should be done through GLSL, and if ARB is used, it should done through ARB shaders. In this way, the atifs fixed-function replacement should never be used when the ARB or GLSL shader backend is used (one shader mode shouldn't supercede another while the program is running).
This would ensure compatibility on the OpenGL side when D3D's fixed-function pipeline is mixed with D3D shaders, still leave room for a proper atifs/nvts shader backend, and not introduce problems related to how one shader backend might interact with another. It honestly doesn't make sense to me why wined3d should just jump in and start using atifs/nvts shaders for fixed-function replacements, when the selected shader mode is ARB or GLSL. And really, atifs/nvts should be last-ditch resorts because they are only available on selected vendors, while ARB and GLSL are more widespread (eg. on Intel, MESA).
Ideally, in my mind, wined3d should autoselect one shader mode (unless a registry option is present to force one). It would be selected based on availability:
* if nv vertex/fragment programs are available, they should be used as a suppliment to the ARB shader backend (giving up to SM3.0 (4.0, actually), with functionality that most closely follows D3D) * else if GLSL is available, that should be used (giving SM2.0/3.0) * else if ARB shaders are available, then they should be used (giving PS1.1 and VS1.4) * else if atifs/nvts are available, use them (giving SM1.x) * else disable D3D shaders
In all cases, it should pick one and stick with it through the life of the process, for both D3D shaders and D3D fixed-function. Don't use atifs if ARB is available, don't use GLSL if nv-supplimented ARB is available, etc.
Though, perhaps I'm misunderstanding the whole issue, and I'm spouting non-sense..
Am Freitag, 11. April 2008 17:22:20 schrieb Chris Robinson:
In all cases, it should pick one and stick with it through the life of the process, for both D3D shaders and D3D fixed-function. Don't use atifs if ARB is available, don't use GLSL if nv-supplimented ARB is available, etc.
There are three issues that lead to this discussion:
*) We may want to mix shader backends. GLSL is a pain on MacOS at least on the vertex side, but it works okish on the fragment shader side. It may be helpful to compile a ARB vertex shader and a GLSL pixel shader. Also an ARB replacement may be faster there because the GLSL compiler on MacOS often creates slow code.
*) We have an NVTS and ATIFS replacement now, but no ARB or GLSL fragment processing code, and it would be cool if we can make use of the ATIFS code on newer cards as well. NVTS currently works because it is included in the baseshader/fixed function backend.
*) It's also about getting the design right in principle, and finding a design that doesn't need an entire rework when dx10 and geometry shaders are introduced.
On Friday 11 April 2008 08:35:21 am Stefan Dösinger wrote:
*) We may want to mix shader backends. GLSL is a pain on MacOS at least on the vertex side, but it works okish on the fragment shader side. It may be helpful to compile a ARB vertex shader and a GLSL pixel shader. Also an ARB replacement may be faster there because the GLSL compiler on MacOS often creates slow code.
As far as I can see, OSX's broken drivers are the only reason you'd not want to use GLSL when nothing else of better capability is available. For that case, there's two options: for nVidia users, extend ARB shaders with nv programs and use them (likely suprassing GLSL performance), and for other cards, bug Apple for proper drivers. IMHO, I don't think WineD3D should be that hindered and cause WineD3D issues with other vendors because Apple insists on making drivers themselves, and doing a poor job at it.
*) We have an NVTS and ATIFS replacement now, but no ARB or GLSL fragment processing code, and it would be cool if we can make use of the ATIFS code on newer cards as well. NVTS currently works because it is included in the baseshader/fixed function backend.
Why would it be cool to use ATIFS on newer cards? It'd be cool if we had something lean, sleek, and working. Mixing output shader modes is just asking for problems, IMO.
*) It's also about getting the design right in principle, and finding a design that doesn't need an entire rework when dx10 and geometry shaders are introduced.
As it is, I imagine WineD3D will need to be reworked for D3D10 no matter what we do here. Unless someone has a working design plan for what changes are needed by D3D10 for WineD3D, we're ultimately just guessing about what may work best (not to mention it'll be using GL3 instead of the current stuff, so whatever's picked may not even be efficient API-wise for it).
Am Freitag, 11. April 2008 20:41:57 schrieb Chris Robinson:
As far as I can see, OSX's broken drivers are the only reason you'd not want to use GLSL when nothing else of better capability is available. For that case, there's two options: for nVidia users, extend ARB shaders with nv programs and use them (likely suprassing GLSL performance), and for other cards, bug Apple for proper drivers. IMHO, I don't think WineD3D should be that hindered and cause WineD3D issues with other vendors because Apple insists on making drivers themselves, and doing a poor job at it.
I don't know who exactly writes the Apple drivers, but the Mac drivers have a few identical issues as the Linux ATI driver.
We have to face that driver bugs are reality. I think we are having more issues in form of user complaints due to the driver<->wined3d connection than the wined3d<->application one. I doubt Apple is going to fix their vertex shader bugs anytime soon. They come up weekly or monthly on their development lists, no official statement yet.
Why would it be cool to use ATIFS on newer cards? It'd be cool if we had something lean, sleek, and working. Mixing output shader modes is just asking for problems, IMO.
For one part it is helpful for testing, and it is a temporary solution until we have an ARB and GLSL pipeline replacement(post 1.0 most likely)
As it is, I imagine WineD3D will need to be reworked for D3D10 no matter what we do here. Unless someone has a working design plan for what changes are needed by D3D10 for WineD3D, we're ultimately just guessing about what may work best (not to mention it'll be using GL3 instead of the current stuff, so whatever's picked may not even be efficient API-wise for it).
I have a rough design plan
-> Replace the current d3d9 shader language with an intermediate language generated by the shader compiler and assembler library I am working on(see the other thread). d3d9 will feed a d3d9 shader into that lib and get a parsed shader back that it passes to wined3d, d3d10 works similarly.
-> Extending pixel shader and vertex shader support to 4.0 is fairly easy with whatever design we choose. Just extend the existing code.
-> Adding geometry shaders should work with the one-in-all shader backend. Add geometry shaders next to pixel and vertex and link all 3 together. I don't know the details with Henri's suggestion, but I think we'd just add an extra geometry state handler and extend the shader backend(or add a new shader backend if we split things up)
-> OpenGL 3.0 needs the finalized GL 3 spec. If GLSL is unmodified(I think so), we can just reuse the existing GLSL shader backend. Otherwise we have to create a new one, and depending on how similar they are maintain it as a separate backend or overwrite a few parts via inheritance.
-> The non-shader parts of GL3 need a separation of the management code and GL specific code. For the state setters, we either need an entirely different set of state setters, or we can reuse a fully-GLSLed pipeline implementation. For textures, surfaces, buffers, devices we can move the management code into a base class and inherit a GL2 and GL3 class, similarly to BaseSurface-D3DSurface-GDISurface
-> A major API rewrite in D3D10 is the new resource model. Basically we have to make IndexBuffers, VertexBuffers and Surfaces general "Buffers". IB and VB can be just replaced by the buffer d3d <= 9 wise, for surfaces we have to create a derived class where we add D3D <= 9 surface specific methods like Blt, the Container texture, Palettes, etc. Textures should be replaced with the more flexible ShaderResourceView objects. (I don't know if one buffer can be used in more than one resource view object. If yes than that might be an issue, as it means that a surface can be part of many textures e.g. as a mip level)
-> Vertex declarations? They are some shader resource view as well now, I have to look at them more in detail. Maybe we can just reuse them unmodified.
-> Pixel formats: We need tests to find out what the d3d10 formats need, but basically replace the WINED3DFORMAT enum with a d3d10-like one and add private values for d3d <= 9 formats missing in d3d10 like P8, X4R4G4B4. The DXGI_FORMATs are nicer than the D3D9 ones IMO.
-> One open issue: Where do we implement srgb reading switching? In WineD3D or D3D9?
There will surely be more open issues, this is just a rough plan.
On 12/04/2008, Stefan Dösinger stefan@codeweavers.com wrote:
We have to face that driver bugs are reality. I think we are having more issues in form of user complaints due to the driver<->wined3d connection than the wined3d<->application one. I doubt Apple is going to fix their vertex shader bugs anytime soon. They come up weekly or monthly on their development lists, no official statement yet.
This is probably more of a concern to CW than Wine in general, but I guess it's a somewhat valid issue. The issue then becomes more one of how much ugliness we're willing to accept in the Wine tree in order to work around obviously broken drivers.
As it is, I imagine WineD3D will need to be reworked for D3D10 no matter what we do here. Unless someone has a working design plan for what changes are needed by D3D10 for WineD3D, we're ultimately just guessing about what may work best (not to mention it'll be using GL3 instead of the current stuff, so whatever's picked may not even be efficient API-wise for it).
I have a rough design plan
-> Replace the current d3d9 shader language with an intermediate language generated by the shader compiler and assembler library I am working on(see the other thread). d3d9 will feed a d3d9 shader into that lib and get a parsed shader back that it passes to wined3d, d3d10 works similarly.
I thought we decided on IRC that it would make more sense to call the shader library from wined3d rather than d3d8, d3d9 and d3d10.
-> Adding geometry shaders should work with the one-in-all shader backend. Add geometry shaders next to pixel and vertex and link all 3 together. I don't know the details with Henri's suggestion, but I think we'd just add an extra geometry state handler and extend the shader backend(or add a new shader backend if we split things up)
A geometry state handler would only make sense if you would want to add a "fixed function geometry processing replacement". Just supporting the shaders only requires adding them to the shader backend (split up or not).
-> OpenGL 3.0 needs the finalized GL 3 spec. If GLSL is unmodified(I think so), we can just reuse the existing GLSL shader backend. Otherwise we have to create a new one, and depending on how similar they are maintain it as a separate backend or overwrite a few parts via inheritance.
AFAIK GLSL will be mostly unchanged, although supposedly it will also allow things like bindable uniforms and binding separate programs for vertex/fragment/geometry stages. This would allow us to get rid of our main use of GLSL linking.
-> The non-shader parts of GL3 need a separation of the management code and GL specific code. For the state setters, we either need an entirely different set of state setters, or we can reuse a fully-GLSLed pipeline implementation. For textures, surfaces, buffers, devices we can move the management code into a base class and inherit a GL2 and GL3 class, similarly to BaseSurface-D3DSurface-GDISurface
I think going the GLSL way makes sense there. Anything supporting GL3 should fully support GLSL already, and I'm not even sure GL3 will support a traditional fixed function pipeline.
Am Samstag, 12. April 2008 05:07:49 schrieb H. Verbeet:
I thought we decided on IRC that it would make more sense to call the shader library from wined3d rather than d3d8, d3d9 and d3d10.
From where we call the shader library isn't all that important to me. For me it's mainly important that the shader library parses the d3d9 and d3d10 shader and the wined3d GL shader generator works on Wine's internal shader representation
I think going the GLSL way makes sense there. Anything supporting GL3 should fully support GLSL already, and I'm not even sure GL3 will support a traditional fixed function pipeline.
What I meant is that it depends on how shaders will look in GL 3.0. If it is GLSL as it is now, we can just use the GLSL shader backend. If it is vastly different we'll need a new one.
On 11/04/2008, Stefan Dösinger stefan@codeweavers.com wrote:
- The state table should be selectable based on the available opengl features
and possibly registry settings. I think we all agree on that
Up to the level of being able to use different state handlers in different situations. I don't necessarily agree with copying and swapping the entire table in one piece.
- We want a fixed function vertex and fragment pipeline replacement with ARB
and GLSL
Only GLSL is a requirement for me. ARB could be nice, but is probably redundant.
- We want to be able to support pixel shaders with atifs and nvts. I don't
know if that will ever be implemented, but if we choose a design that makes this hard or impossible that's not going to help
This is not a hard requirement for me, although I certainly think we should be able to create a design that allows for this.
- Even if we have an ARB/GLSL replacement, it should be possible to use D3D
shaders but still use the opengl fixed function pipeline
Agreed.
- A nice to have is to be able to use the replacement pipelines together with
shaders, but that is not a hard requirement for me. We need an ARB and GLSL replacement anyway.
I assume you mean atifs & nvrc specifically here, in which case this is probably a bit more important than "nice to have". The most tricky situation to support here will be cards that support vertex shaders, but not fragment shaders. If we want to support pixel shaders using atifs / nvrc we have to allow mixing GLSL / ARB with nvrc / atifs for it to be of any use.
I understand Henri's suggestion this way: Don't put the state table into the shader backend(*), but select it separately. This way the ATIFS pipeline replacement doesn't have to be a shader model. That way we can choose the ARB shader backend and ATIFS state table on r200 cards, and use GLSL and ATIFS on newer cards.
That way we get full shader support and still the advantages of the pipeline replacement without using 3 inherited shader backends. Then the state table could have some backend to enable / disable ATIFS like in this patch. (The bottom line of the patch here is that we should enable the extension we *use*, and not what happens to be available).
The state table would report the fixed function caps it has, and the shader backend reports the programmable pipeline caps. GetDeviceCaps has to collect the caps from multiple places, but that isn't an issue IMO.
Is that correct, or did I missunderstand something?
My main point is that a fixed function replacement and the shader backend should be two different things, at least interface wise. My suggestion for an implementation would be somethingg like this:
- Split the state table in three pieces: vertex processing states (eg. lighting, materials, etc), fragment processing states (eg. texture stage stuff) and other states (eg. blending states). Aside from being clearer this allows you to swap these parts out independently from each other and possibly skip applying them as a whole in case a shader is active for that part of the pipeline (I imagine this could have some performance advantages as well, although I'm not sure how much). - Allow the state table more explicitly to have some state of its own. It currently has the list of dirty states of course, but it's more managed by the device than by the statetable as such. In effect this would introduce a state management object.
The basic setup would be something like this:
struct shader_backend_t { void (*shader_select)(void *data, BOOL usePS, BOOL useVS); void (*shader_load_constants)(void *data, char usePS, char useVS); ... };
struct state_management_t { void (*mark_state_dirty)(void *data, DWORD state); void (*apply_states)(void *data); ... };
struct IWineD3DDeviceImpl { ... struct state_management_t vertex_state_manager; struct state_management_t fragment_state_manager; struct state_management_t other_state_manager; struct shader_backend_t shader_backend;
void *vertex_private_data; void *fragment_private_data; void *other_private_data; void *shader_private_data; ... };
/* Usage */ device->vertex_state_manager->mark_state_dirty(device->vertex_private_data, state); device->fragment_state_manager->mark_state_dirty(device->fragment_private_data, state); etc. ... if (!use_vs) { device->vertex_state_manager->apply_states(device->vertex_private_data); } if (!use_ps) { device->fragment_state_manager->apply_states(device->fragment_private_data); } device->shader_backend->select_shader(device->shader_private_date, use_vs, use_ps);
Some example configurations:
GLSL FFP, GLSL shaders device = {glsl_vsm, glsl_fsm, ff_osm, glsl_shader_backend, glsl_private_data, glsl_private_data, ff_private_data, glsl_private_data};
Fixed function vertex, ATIFS fragment processing, GLSL shader backend device = {ff_vsm, atifs_fsm, ff_osm, glsl_shader_backend, ff_private_data, atifs_private_data, ff_private_data, glsl_private_data};
ARB vertex FFP, Fixed function fragment, ARB shaders device = arb_vsm, ff_fsm, ff_osm, arb_shader_backend, arb_ffp_private_data, ff_private_data, ff_private_data, arb_shader_private_data};
This doesn't support mixing eg. ARB vertex shaders with NVRC pixel shaders, but it would "simply" be a matter of splitting up the shader backend in a similar way to the state table. Important to note here is that the private data could be shared between fixed function replacements and the shader backend, like in the case of GLSL. I could imagine using a structure more similar to state_management_t for the shader backend as well.
Of course there are a number of variations you can make on this (perhaps most notably storing the call tables and private data in the same struct), but I think the important parts are the split between vertex/fragment/other/shader, and the state management having private data.
However, I see two issues with that:
*) In which place do we decide which program/shader to use in the end? If we have an ARB fixed function replacement program and an ARB program generated from a D3D shader? Currently this happens to work because ARB/GLSL override NVTS and NVRC, but this fails if we have a shader and ffp implementation using the same extension
The private data between the FFP replacement and the shader backend can be shared. That means our general GLSL management stuff can see you've got eg. a FFP vertex shader and a pixel shader and link them together.
*) What do we do if a shader implementation wants to overwrite some states for its programmable pipeline work? For example take my GLSL clipplane patch: The ARB shader backend disables clipplanes when a shader is used. This example is about a driver bug, but there are others as well: ATIFS and NVTS with a pixel shader implementation will have to override a part of the sampler setting(the stuff that is currently done in activate_dimensions)
How would you deal with this?
I guess this will mostly be an issue for states that end up being in the "other" category. In general I think that as long as it's only for a couple of states we can just keep doing more or less what we currently do and check if the relevant state management backend is used.
Am Freitag, 11. April 2008 22:42:17 schrieb H. Verbeet:
I did not yet read it in depth, so I am just replying to a few concerns I spotted. I'll look at it more in-depth tomorrow.
- The state table should be selectable based on the available opengl
features and possibly registry settings. I think we all agree on that
Up to the level of being able to use different state handlers in different situations. I don't necessarily agree with copying and swapping the entire table in one piece.
Agreed, we just want to be able to swap states, in which way is open.
- We want a fixed function vertex and fragment pipeline replacement
with ARB and GLSL
Only GLSL is a requirement for me. ARB could be nice, but is probably redundant.
Intel cards? Also GLSL has the problem with the link times.
- A nice to have is to be able to use the replacement pipelines
together with shaders, but that is not a hard requirement for me. We need an ARB and GLSL replacement anyway.
I assume you mean atifs & nvrc specifically here, in which case this is probably a bit more important than "nice to have". The most tricky situation to support here will be cards that support vertex shaders, but not fragment shaders. If we want to support pixel shaders using atifs / nvrc we have to allow mixing GLSL / ARB with nvrc / atifs for it to be of any use.
The current implementation allows that, we're using ATIFS+ARBVP
Aside from being clearer this allows you to swap these parts out independently from each other and possibly skip applying them as a whole in case a shader is active for that part of the pipeline (I imagine this could have some performance advantages as well, although I'm not sure how much).
For the performance, it depends on the app. If we skip applying (redundant) fixed function settings when a shader is used, we have to reapply all of them the shader use might have changed when the shader is deactivated. It's a decision between making shader on-off switches cheap vs filtering out redundant state changes done by the app. Currently there is no additional fixed function pipeline cost involved when using shaders as long as the app doesn't touch the ffp states.
struct IWineD3DDeviceImpl { ... struct state_management_t vertex_state_manager; struct state_management_t fragment_state_manager; struct state_management_t other_state_manager; struct shader_backend_t shader_backend;
void *vertex_private_data; void *fragment_private_data; void *other_private_data; void *shader_private_data; ...
};
One of the reasons for using a single state table was that we have interconnections between the vertex states, fragment states and other states. For example, where would you put the vertex buffer loading? Is that an "other" state, or a vertex state? The FVF and the vertex shader influence if we're entering the FF or shader vertex processing codepaths. The FVF affects fogging, which is a vertex state in some shader implementations(nvts, atifs), but implemented in pixel shaders in GLSL and ARB. The texture transform flags are a fixed function vertex state on paper, but we have to compile the D3DTTFF_PROJECTED flag into the fragment shader. (we already do so, that's documented on the msdn, so clearly a missdesign on Microsoft's side). There are many more examples of vertex-fragment-other state interconnections.
This doesn't support mixing eg. ARB vertex shaders with NVRC pixel shaders, but it would "simply" be a matter of splitting up the shader backend in a similar way to the state table. Important to note here is that the private data could be shared between fixed function replacements and the shader backend, like in the case of GLSL. I could imagine using a structure more similar to state_management_t for the shader backend as well.
The unsplit shader backend is the current d3d shader implementation + Ivan's pipeline linker object. If you split them up, where do you set GLSL shader constants?
Also note that the GLSL pixel shaders depend on the type of vertex processing(non-GLSL or GLSL) to load the 3.0 varyings correctly. We could get rid of that by requiring GLSL vp, but that would break requirement 4, using the GL fixed function pipeline.
My concern is that if you split vertex processing and fragment processing up, you need a linker object that deals with linking. This linker object has know about the vertex and fragment processing state handlers and tables, thus creating a special linker for each vs-ps-vff-ffp-other combination. I don't have any objections against that in principle, but I am afraid that due to the high interconnection between vertex, fragment and other states we would end up with implementing most things in the linker.
On 12/04/2008, Stefan Dösinger stefan@codeweavers.com wrote:
- We want a fixed function vertex and fragment pipeline replacement
with ARB and GLSL
Only GLSL is a requirement for me. ARB could be nice, but is probably redundant.
Intel cards? Also GLSL has the problem with the link times.
I'm currently not too bothered about Intel cards, although that might change in the future. Either way, it's certainly possible to create an ARB implementation, it's more a matter of priority.
Aside from being clearer this allows you to swap these parts out independently from each other and possibly skip applying them as a whole in case a shader is active for that part of the pipeline (I imagine this could have some performance advantages as well, although I'm not sure how much).
For the performance, it depends on the app. If we skip applying (redundant) fixed function settings when a shader is used, we have to reapply all of them the shader use might have changed when the shader is deactivated. It's a decision between making shader on-off switches cheap vs filtering out redundant state changes done by the app. Currently there is no additional fixed function pipeline cost involved when using shaders as long as the app doesn't touch the ffp states.
I'm not sure I follow your reasoning here. Just to be clear, I'm only talking about applying states, not marking them dirty. In the worst case it simply means you apply them when switching back to fixed function instead of in the current draw call, but you could potentially save some redundant state applications while the shader is active. However, what I was actually wondering about is if it might be more expensive on some drivers to do fixed function state changes while a shader is active.
One of the reasons for using a single state table was that we have interconnections between the vertex states, fragment states and other states. For example, where would you put the vertex buffer loading? Is that an "other" state, or a vertex state? The FVF and the vertex shader influence if we're entering the FF or shader vertex processing codepaths. The FVF affects fogging, which is a vertex state in some shader implementations(nvts, atifs), but implemented in pixel shaders in GLSL and ARB. The texture transform flags are a fixed function vertex state on paper, but we have to compile the D3DTTFF_PROJECTED flag into the fragment shader. (we already do so, that's documented on the msdn, so clearly a missdesign on Microsoft's side). There are many more examples of vertex-fragment-other state interconnections.
Anything that gets ignored when a vertex shader is active gets put in the vertex states, anything that gets ignored when a fragment shader is active should be part of the fragment states. Resource loading would be part of the "other" states. Most of the connections you mention appear to be connections on the D3D side, these would have no consequences for a separation on the GL side of things. Iow, it's perfectly valid for a state in the vertex block and a state in the fragment block to read from the same D3D state.
This doesn't support mixing eg. ARB vertex shaders with NVRC pixel shaders, but it would "simply" be a matter of splitting up the shader backend in a similar way to the state table. Important to note here is that the private data could be shared between fixed function replacements and the shader backend, like in the case of GLSL. I could imagine using a structure more similar to state_management_t for the shader backend as well.
The unsplit shader backend is the current d3d shader implementation + Ivan's pipeline linker object. If you split them up, where do you set GLSL shader constants?
In the shader backend. Splitting things up doesn't change much there. You could decide to not even expose constant loading though, and instead just mark the state dirty on the shader backend. Deciding when to load constants then becomes the responsibility of the shader backend.
Also note that the GLSL pixel shaders depend on the type of vertex processing(non-GLSL or GLSL) to load the 3.0 varyings correctly. We could get rid of that by requiring GLSL vp, but that would break requirement 4, using the GL fixed function pipeline.
From the shader's point of view a pipeline replacement should be
indistinguishable from real fixed function processing, other than that in case of GLSL you have to link it together into a single program.
My concern is that if you split vertex processing and fragment processing up, you need a linker object that deals with linking. This linker object has know about the vertex and fragment processing state handlers and tables, thus creating a special linker for each vs-ps-vff-ffp-other combination. I don't have any objections against that in principle, but I am afraid that due to the high interconnection between vertex, fragment and other states we would end up with implementing most things in the linker.
It's a concern when mixing different shader backends, but I'm not sure there's a lot we can do about it there. For fixed function replacements it shouldn't matter though, because you always write to the predefined fixed function outputs. In that case linking is only an issue for GLSL FFP + GLSL shader, which is trivial to implement. I'd prefer to keep this discussion focussed on fixed function implementations though. GL3, D3D10 and splitting up the shader backend are certainly interesting, but at this point it just distract from the core issues.
Am Samstag, 12. April 2008 04:27:07 schrieb H. Verbeet:
Anything that gets ignored when a vertex shader is active gets put in the vertex states, anything that gets ignored when a fragment shader is active should be part of the fragment states. Resource loading would be part of the "other" states.
GL named arrays get ignored when a vertex shader is in use, unless the shader explicitly uses them...
Most of the connections you mention appear to be connections on the D3D side, these would have no consequences for a separation on the GL side of things. Iow, it's perfectly valid for a state in the vertex block and a state in the fragment block to read from the same D3D state.
So if e.g. the vertex declaration is changed you would dirtify many states: -> misc stream sources -> vertex shader(use it or not?) -> Fog -> Fixed function vertex processing matrices(rhw vertices or not) -> texture transforms -> (a few others as well) I have no problem with doing that, changing the vdecl is an expensive business no matter what we do, just asking to make sure I understand what you mean. How do you control which gl states are dirtified by which d3d state? This will depend on the combination of backends you use.
There are quite a few opengl connections as well, although they work differently.It's more the various interactions between shader extensions. In quite a few cases the fragment processing implementation has to configure the vertex processing correctly to feed it in the right way, and also the other way round.
For example, to stick to the texture transform flags. Let's consider we're using fixed function D3D vertex processing in whatever GL extension. Now enter fragment processing and D3DTTFF_PROJECTED:
-> With fixed function GL or a NVTS fixed function replacement we have to make sure that the 4th coordinate is 1.0 to disabe the GL division if TTFF_PROJECTED is not set, and if it is set with TTFF_COUNT3 make sure that the 3rd coord is copied to the 4th -> With GLSL or ARB fixed function replacement we can handle the lack of TTFF_PROJECTED properly, but not COUNT3 -> With ATIFS we can handle everything properly in the replacement shader -> With an ARB, GLSL or ATIFS D3D shader we don't need any special texture transform fixups -> With an NVTS D3D shader we have to take care about disabling projected textures in vertex processing again
That means different fragment processing implementations have different vertex processing requirements. Now you could make that a flag in the fragment processing and pixel shader implementation. You'd need 4 flags(nonshader_unprojected, shader_unprojected, nonshader_count3, shader_count3). Are you sure the flags won't grow out of control?
Another example is fogging. Fog is overwritten by ARB and GLSL, but not ATIFS and NVTS(as far as I can see). Is fog a vertex or fragment state? How do you share the quite complex fog applying code between the ATIFS, NVTS and GL fixed function implementation if you make it a fragment state?
Also note that the GLSL pixel shaders depend on the type of vertex processing(non-GLSL or GLSL) to load the 3.0 varyings correctly. We could get rid of that by requiring GLSL vp, but that would break requirement 4, using the GL fixed function pipeline.
From the shader's point of view a pipeline replacement should be indistinguishable from real fixed function processing, other than that in case of GLSL you have to link it together into a single program.
PS 3.0 shaders read texcoords and colors from custom varyings, and D3DCOLOR0, 1, TEXCOORD0-7 are linked to their predecessors. That way a 3.0 pshader can interact with fixed function vertex processing and rhw drawing.(fixed function vertex processing is broken on ATI on Windows, but I don't want to justify the design with an ATI driver bug. RHW+3.0 is used by e.g. Age of Empires 3).
Where would you write the TEXCOORD0-7 and D3DCOLOR0 and 1 varyings from a GLSL vertex shader, and where do you read them from in the pixel shader? Keep indirect varying addressing in the pshader in mind.
It's a concern when mixing different shader backends, but I'm not sure there's a lot we can do about it there. For fixed function replacements it shouldn't matter though, because you always write to the predefined fixed function outputs. In that case linking is only an issue for GLSL FFP + GLSL shader, which is trivial to implement.
When you're not able to link ARBVP+ATIFS pixel shaders or ATIVP+NVTS then that defeats the point of implementing pixel shaders with them, so we have to do something about them.
From the mail from yesterday:
The private data between the FFP replacement and the shader backend can be shared. That means our general GLSL management stuff can see you've got eg. a FFP vertex shader and a pixel shader and link them together.
I am using GLSL for pixel shaders. Can I share my private data with a GLSL fragment replacement, or do I have to bother that the fixed function fragment processing might be done using ATIFS? Not using GLSL pshaders + ATIFS was one of the symtoms for you that "the shader backend isn't the right place to implement this".
---
I'm currently not too bothered about Intel cards, although that might change in the future. Either way, it's certainly possible to create an ARB implementation, it's more a matter of priority.
They are pretty widespread and are even used in the EEEPC, so I think dealing with these cards will become a priority soon, at least for me. Unfortunately the driver sucks in terms of stability and performance.
On 12/04/2008, Stefan Dösinger stefan@codeweavers.com wrote:
Am Samstag, 12. April 2008 04:27:07 schrieb H. Verbeet:
Anything that gets ignored when a vertex shader is active gets put in the vertex states, anything that gets ignored when a fragment shader is active should be part of the fragment states. Resource loading would be part of the "other" states.
GL named arrays get ignored when a vertex shader is in use, unless the shader explicitly uses them...
It doesn't get ignored, you still do the upload and the data is still available should the shader choose to use it. Still, I probably should've phrased it as "functionality that gets replaced by a vertex / fragment shader".
Most of the connections you mention appear to be connections on the D3D side, these would have no consequences for a separation on the GL side of things. Iow, it's perfectly valid for a state in the vertex block and a state in the fragment block to read from the same D3D state.
So if e.g. the vertex declaration is changed you would dirtify many states: -> misc stream sources -> vertex shader(use it or not?) -> Fog -> Fixed function vertex processing matrices(rhw vertices or not) -> texture transforms -> (a few others as well) I have no problem with doing that, changing the vdecl is an expensive business no matter what we do, just asking to make sure I understand what you mean. How do you control which gl states are dirtified by which d3d state? This will depend on the combination of backends you use.
I'm not sure about the exact splitup you're using here, but it would mean potentially dirtifying multiple states, yes. I could imagine it as simply dirtifying the vertexdeclaration state on both the vertex and fragment state tables.
There are quite a few opengl connections as well, although they work differently.It's more the various interactions between shader extensions. In quite a few cases the fragment processing implementation has to configure the vertex processing correctly to feed it in the right way, and also the other way round.
For example, to stick to the texture transform flags. Let's consider we're using fixed function D3D vertex processing in whatever GL extension. Now enter fragment processing and D3DTTFF_PROJECTED:
-> With fixed function GL or a NVTS fixed function replacement we have to make sure that the 4th coordinate is 1.0 to disabe the GL division if TTFF_PROJECTED is not set, and if it is set with TTFF_COUNT3 make sure that the 3rd coord is copied to the 4th -> With GLSL or ARB fixed function replacement we can handle the lack of TTFF_PROJECTED properly, but not COUNT3 -> With ATIFS we can handle everything properly in the replacement shader -> With an ARB, GLSL or ATIFS D3D shader we don't need any special texture transform fixups -> With an NVTS D3D shader we have to take care about disabling projected textures in vertex processing again
That means different fragment processing implementations have different vertex processing requirements. Now you could make that a flag in the fragment processing and pixel shader implementation. You'd need 4 flags(nonshader_unprojected, shader_unprojected, nonshader_count3, shader_count3). Are you sure the flags won't grow out of control?
You only need two. One to toggle writing 1.0 to the 4th coordinate when needed, and one to toggle copying the 3th coordinate to the 4th when needed. It would certainly beat doing an extension check for every possible backend. Right now we always do the fixup, so in that respect it would be an improvement as well.
Another example is fogging. Fog is overwritten by ARB and GLSL, but not ATIFS and NVTS(as far as I can see). Is fog a vertex or fragment state? How do you share the quite complex fog applying code between the ATIFS, NVTS and GL fixed function implementation if you make it a fragment state?
That depends on the fog type. Vertex fog is a vertex state, fragment fog is a fragment state. Changing the type would obviously have interactions with both parts of the pipeline. As for applying the state, there's no reason different implementations can call a common function in eg. utils.c to calculate things like the fog mode, type, start, end, etc. One could argue it doesn't belong in state_fog() in the first place.
From the shader's point of view a pipeline replacement should be indistinguishable from real fixed function processing, other than that in case of GLSL you have to link it together into a single program.
PS 3.0 shaders read texcoords and colors from custom varyings, and D3DCOLOR0, 1, TEXCOORD0-7 are linked to their predecessors. That way a 3.0 pshader can interact with fixed function vertex processing and rhw drawing.(fixed function vertex processing is broken on ATI on Windows, but I don't want to justify the design with an ATI driver bug. RHW+3.0 is used by e.g. Age of Empires 3).
Where would you write the TEXCOORD0-7 and D3DCOLOR0 and 1 varyings from a GLSL vertex shader, and where do you read them from in the pixel shader? Keep indirect varying addressing in the pshader in mind.
Just to be clear, with "GLSL vertex shader" you mean "GLSL vertex processing replacement", right? A vertex processing replacement shader would write to the regular fixed function output, ie gl_FrontColor, gl_FrontSecondaryColor, gl_TexCoord[], etc. The fragment shader would read them the same way as it does when paired with fixed function or pre-3.0 vertex shaders.
It's a concern when mixing different shader backends, but I'm not sure there's a lot we can do about it there. For fixed function replacements it shouldn't matter though, because you always write to the predefined fixed function outputs. In that case linking is only an issue for GLSL FFP + GLSL shader, which is trivial to implement.
When you're not able to link ARBVP+ATIFS pixel shaders or ATIVP+NVTS then that defeats the point of implementing pixel shaders with them, so we have to do something about them.
Actually, I'm not sure what I was thinking here. It should only be a concern when mixing two SM3.0+ shaders between different backends, which atifs and nvts are never going to support. The only combination that would need some kind of linking would be GLSL SM3.0 vertex shaders with GLSL SM3.0 fragment shaders, but we currently handle that and it could pretty much continue to keep working the same way. ARBVP + ATIFS PS or ATIVP + NVTS PS should work without problem, if you split up the shader backend. I still think it's premature to worry too much about that at the moment, since it's a separate issue from how you split up the state management.
From the mail from yesterday:
The private data between the FFP replacement and the shader backend can be shared. That means our general GLSL management stuff can see you've got eg. a FFP vertex shader and a pixel shader and link them together.
I am using GLSL for pixel shaders. Can I share my private data with a GLSL fragment replacement, or do I have to bother that the fixed function fragment processing might be done using ATIFS? Not using GLSL pshaders + ATIFS was one of the symtoms for you that "the shader backend isn't the right place to implement this".
Pixel shaders + fragment processing replacement doesn't make sense. Either a GLSL vertex processing replacement + GLSL pixel shader or a GLSL vertex shader + GLSL fragment processing replacement would work though. The "GLSL pipeline object" would know if it's being used as vertex and/or fragment replacement and link everything together. In case atifs is used no linking is required.
Basically you should think of the GLSL private data + vertex processing replacement + fragment processing replacement + shader backend as a single object exposing three different interfaces. Call it a "pipeline object" if you like.
I'm currently not too bothered about Intel cards, although that might change in the future. Either way, it's certainly possible to create an ARB implementation, it's more a matter of priority.
They are pretty widespread and are even used in the EEEPC, so I think dealing with these cards will become a priority soon, at least for me. Unfortunately the driver sucks in terms of stability and performance.
Are those cards powerful enough to support a fixed function replacement?
Am Samstag, 12. April 2008 15:06:38 schrieb H. Verbeet:
It doesn't get ignored, you still do the upload and the data is still available should the shader choose to use it. Still, I probably should've phrased it as "functionality that gets replaced by a vertex / fragment shader".
But since D3D vertex shaders always read the numbered arrays and fixed function always reads the named arrays the named arrays get de-facto replaced as far as we're concerned.
That means different fragment processing implementations have different vertex processing requirements. Now you could make that a flag in the fragment processing and pixel shader implementation. You'd need 4 flags(nonshader_unprojected, shader_unprojected, nonshader_count3, shader_count3). Are you sure the flags won't grow out of control?
You only need two. One to toggle writing 1.0 to the 4th coordinate when needed, and one to toggle copying the 3th coordinate to the 4th when needed. It would certainly beat doing an extension check for every possible backend. Right now we always do the fixup, so in that respect it would be an improvement as well.
I'm working on a patch that makes the atifs shader code take care of applying the texture transform matrix. That works naturally without any flags or backend check if you don't try to split vertex and fragment processing by force.
That depends on the fog type. Vertex fog is a vertex state, fragment fog is a fragment state. Changing the type would obviously have interactions with both parts of the pipeline.
On the GL side both vertex and fragment fog are applied to the same GL state. Using a ARBFP or GLSL fragment shader replaces vertex fog as well, so you'll have to implement both types in the fragment processing replacement.
The fog settings depend on the vertex decl(XYZ vs XYZRHW) and the shader properties("foggy shader"). That means the fragment processing code(pixel shader and ffp replacement) would look at the core of the vertex processing settings. Doesn't that defeat separating them in the first place?
(I understand that you want to split the state types on the GL side, not the D3D side. But when you split applying of one D3D state in 3 pieces, I fail to see how it is cleaner)
You didn't answer how you plan to implement state dirtification. You have a SetRenderState call that changes render state X. Which implementation state(s) do you dirtify? Ie:
device->fragment_state_manager->mark_state_dirty(device->fragment_private_data,
state);
Where does "state" come from?
if (!use_vs) { device->vertex_state_manager->apply_states(device->vertex_private_data); } ...
This would be done where?
Where would you write the TEXCOORD0-7 and D3DCOLOR0 and 1 varyings from a GLSL vertex shader, and where do you read them from in the pixel shader? Keep indirect varying addressing in the pshader in mind.
Just to be clear, with "GLSL vertex shader" you mean "GLSL vertex processing replacement", right? A vertex processing replacement shader would write to the regular fixed function output, ie gl_FrontColor, gl_FrontSecondaryColor, gl_TexCoord[], etc. The fragment shader would read them the same way as it does when paired with fixed function or pre-3.0 vertex shaders.
No, I mean a D3D (3.0) vertex shader that is translated and running via GLSL. Currently the pixel shader decides where the vertex shader(all versions) writes the varyings to(generate_param_reorder_function). If a fixed function vertex replacement shader and pre-3.0 shader writes to the regular fixed function output, how would you run a pixel shader like the one in vshader_version_varying_test() in visual.c together with a 1.x or 2.0 vertex shader, or XYZRHW data from a GLSL vertex pipeline replacement shader? (With the fixed function GL vertex pipeline we're screwed, but we aren't necessarily screwed with a GLSL vertex pipeline replacement)
Pixel shaders + fragment processing replacement doesn't make sense. Either a GLSL vertex processing replacement + GLSL pixel shader or a GLSL vertex shader + GLSL fragment processing replacement would work though. The "GLSL pipeline object" would know if it's being used as vertex and/or fragment replacement and link everything together. In case atifs is used no linking is required.
What I meant was that: Scenario 1: GLSL is used for pixel shaders and fragment pipeline replacement. How do I find out if I have to link the fragment replacement GLSL shader or the pixel shader GLSL shader into my program I activate? -> Share private data between GLSL shader backend and GLSL fixed function object
Followup scenario 2: We don't have a GLSL fixed function backend yet. I have an ATI X1600 card, I am using GLSL for pixel shaders, and I am using ATIFS for the pipeline replacement. The GLSL pixel shader code reads the GLSL fragment pipeline replacement private data. -> This GLSL fragment replacement private data is either nonexistant or hanging around without code maintaining it. How do we deal with that?
They are pretty widespread and are even used in the EEEPC, so I think dealing with these cards will become a priority soon, at least for me. Unfortunately the driver sucks in terms of stability and performance.
Are those cards powerful enough to support a fixed function replacement?
Afaik the older cards do not support vertex processing, it's done in software, so it depends on the driver. On the fragment side they are pretty solid and work only with programmable fragment processing internally. The newer X3100 only supports programmable vertex and fragment processing in hardware, so a pipeline replacement is either done by us or the driver. It might be powerful enough for GLSL as well though.
On 12/04/2008, Stefan Dösinger stefan@codeweavers.com wrote:
But since D3D vertex shaders always read the numbered arrays and fixed function always reads the named arrays the named arrays get de-facto replaced as far as we're concerned.
Which is completely irrelevant for classifying the operation. Please read the ARB_vertex_program spec, issue 3 and the ARB_fragment_program spec, issue 13 to get a better idea of what shaders replace and what they don't.
You only need two. One to toggle writing 1.0 to the 4th coordinate when needed, and one to toggle copying the 3th coordinate to the 4th when needed. It would certainly beat doing an extension check for every possible backend. Right now we always do the fixup, so in that respect it would be an improvement as well.
I'm working on a patch that makes the atifs shader code take care of applying the texture transform matrix. That works naturally without any flags or backend check if you don't try to split vertex and fragment processing by force.
If the issue is that you've got an interest in keeping the existing structure because you've already written code on top of it there's not much point in having this discussion in the first place. If that's not the issue, I'd like to mention that the whole point of having interfaces is that you can avoid ugliness like setting vertex processing state in the fragment processing part of the pipeline. There's also nothing forceful about splitting your pipeline in vertex and fragment processing, that's how the hardware works, it's how GL works, and it's how D3D works.
That depends on the fog type. Vertex fog is a vertex state, fragment fog is a fragment state. Changing the type would obviously have interactions with both parts of the pipeline.
On the GL side both vertex and fragment fog are applied to the same GL state. Using a ARBFP or GLSL fragment shader replaces vertex fog as well, so you'll have to implement both types in the fragment processing replacement.
The fog blending is a fragment operation, yes. Coordinate calculation depends on the coordinate source and fog hint, and can happen either during vertex processing, fragment processing or not at all if fog coordinates are specified.
The fog settings depend on the vertex decl(XYZ vs XYZRHW) and the shader properties("foggy shader"). That means the fragment processing code(pixel shader and ffp replacement) would look at the core of the vertex processing settings. Doesn't that defeat separating them in the first place?
No, like you correctly mention below, the point is to separate the implementation, not where the implementation gets its information from on the D3D side.
(I understand that you want to split the state types on the GL side, not the D3D side. But when you split applying of one D3D state in 3 pieces, I fail to see how it is cleaner)
This would hardly be something new. You mentioned the vertexdeclaration state yourself that modifies multiple GL states, and it's hardly the only one.
You didn't answer how you plan to implement state dirtification. You have a SetRenderState call that changes render state X. Which implementation state(s) do you dirtify? Ie:
device->fragment_state_manager->mark_state_dirty(device->fragment_private_data,
state);
Where does "state" come from?
I don't remember you asking, but I see no reason to change the basic way dirtification is currently done.
if (!use_vs) { device->vertex_state_manager->apply_states(device->vertex_private_data); }
...
This would be done where?
ActivateContext, CTXUSAGE_DRAWPRIM. (Yes it should probably be part of the context, not the device, my bad.)
No, I mean a D3D (3.0) vertex shader that is translated and running via GLSL. Currently the pixel shader decides where the vertex shader(all versions) writes the varyings to(generate_param_reorder_function). If a fixed function vertex replacement shader and pre-3.0 shader writes to the regular fixed function output, how would you run a pixel shader like the one in vshader_version_varying_test() in visual.c together with a 1.x or 2.0 vertex shader, or XYZRHW data from a GLSL vertex pipeline replacement shader? (With the fixed function GL vertex pipeline we're screwed, but we aren't necessarily screwed with a GLSL vertex pipeline replacement)
The pipeline object would certainly have access to all the information required to create such a reordering function, but I fail to see how it's relevant at this point. The idea here certainly isn't to magically fix all state and shader related issues in wined3d, it's just about making fixed function replacement shaders possible in a maintainable way.
What I meant was that: Scenario 1: GLSL is used for pixel shaders and fragment pipeline replacement. How do I find out if I have to link the fragment replacement GLSL shader or the pixel shader GLSL shader into my program I activate? -> Share private data between GLSL shader backend and GLSL fixed function object
Followup scenario 2: We don't have a GLSL fixed function backend yet. I have an ATI X1600 card, I am using GLSL for pixel shaders, and I am using ATIFS for the pipeline replacement. The GLSL pixel shader code reads the GLSL fragment pipeline replacement private data. -> This GLSL fragment replacement private data is either nonexistant or hanging around without code maintaining it. How do we deal with that?
Not exactly. In the first place, if there's no GLSL fixed function implementation you'll never have to link to it, simple as that. Now in case there *is* a GLSL fixed function replacement but it isn't used (for whatever reason), that simply means the GLSL pipeline object's private data will tell it it doesn't have to link anything (or rather, *not* tell it it has to link to the ffp replacement).
There really is no distinction between "pixel shader private data" and "ffp replacement private data", they're both pointers to the same block of memory, and eg. the shader backend will always only get passed its own private data.
I'd also like to note that most of the issues you bring up are not new or specific to this design at all, and some of them wouldn't work at all with the current structure.
Am Samstag, 12. April 2008 19:55:53 schrieb H. Verbeet:
Which is completely irrelevant for classifying the operation. Please read the ARB_vertex_program spec, issue 3 and the ARB_fragment_program spec, issue 13 to get a better idea of what shaders replace and what they don't.
Yes, GL says so, but I think strictly sticking to the GL classification is a bad idea, see below.
If the issue is that you've got an interest in keeping the existing structure because you've already written code on top of it there's not much point in having this discussion in the first place. If that's not the issue, I'd like to mention that the whole point of having interfaces is that you can avoid ugliness like setting vertex processing state in the fragment processing part of the pipeline. There's also nothing forceful about splitting your pipeline in vertex and fragment processing, that's how the hardware works, it's how GL works, and it's how D3D works.
I agree that GL works that way(fragment-vertex split), and almost certainly graphics hardware as well, although we don't know unless we look at the drivers(how the hardware works is irrelevant since GL abstracts that). However, D3D does *not* work that way, otherwise there would not be any issue to discuss here, since then D3D states would perfectly match the GL ones.
As I understand it, this discussion comes down to the designing the state setting interface and implementation in an OpenGL oriented way or a D3D oriented way.
@ keeping old code: That's not too much of an issue, but you attacked my design on the grounds of a missing implementation of a functionality, so I explained how that would be implemented.
device->fragment_state_manager->mark_state_dirty(device->fragment_private _data,
state);
Where does "state" come from?
I don't remember you asking, but I see no reason to change the basic way dirtification is currently done.
Where does SetRenderState(or any other state setter) know if it has to dirtify a fragment, vertex, misc state, or multiples of them?
if (!use_vs) {
device->vertex_state_manager->apply_states(device->vertex_private_data ); }
...
This would be done where?
ActivateContext, CTXUSAGE_DRAWPRIM. (Yes it should probably be part of the context, not the device, my bad.)
That means polling the states that use_vs() and use_ps() check for changes instead of getting notified about changes, as well as polling the SetPixelShader and SetVertexShader settings(via select_shader()). Avoiding polling was one of the goals of the state management rewrite more than a year ago. I don't think there's any relevant performance penalty in doing the little polling you suggest, but where do we draw the line?
The pipeline object would certainly have access to all the information required to create such a reordering function, but I fail to see how it's relevant at this point. The idea here certainly isn't to magically fix all state and shader related issues in wined3d, it's just about making fixed function replacement shaders possible in a maintainable way.
It is insofar relevant as it affects the issues we can later on fix with the pipeline replacement and which we cannot. If the GLSL vertex pipeline replacement shader can only write to the builtin varying because the pixel shader input expects it that way, we'll never be able to fix the todo_wines in fixed_function_varying_test() in visual.c
... There really is no distinction between "pixel shader private data" and "ffp replacement private data", they're both pointers to the same block of memory, and eg. the shader backend will always only get passed its own private data.
Point taken, I see now how you can sort that out the fixed function replacement vs shader activation in the same backend via the private data. You'll have to watch out though that the default initialization value is a valid "do not load a replacement shader" request.
I'd also like to note that most of the issues you bring up are not new or specific to this design at all, and some of them wouldn't work at all with the current structure.
Surely they aren't specific, but I do not see any that would not work with the current structure. Agreed, using ATIFS fragment processing + ARBFP / GLSL pixel shaders needs 3 inherited ATIFS shader backends, which is ugly, I agree. But it does not hide the ugliness and makes the issues that mixing GL shader functionality causes explicit. It does not limit the fixed function replacement to the lowest common denoimator(GL fixed function) and allows us to flexibly use additional features of GL extensions or GLSL to fix bugs we'd otherwise have to mark as WONTFIX.
I think designing the state setting interface in a D3D oriented way is better because it is opengl extension independent. If the state setting API is built on the GL state classifications some additional layer to deal with the differences of the state implementations is needed(e.g. by giving vertex, fragment and misc state setters a full state table each). Additionally with the splitup there is the already discussed hidden issue of implementation interactions which I don't think can be consumed in the abstraction layers, and the mentioned issue with the lowest common denominator interface of the fixed function replacement shaders.
Stefan Dösinger wrote:
Alexandre didn't commit the patch, I think we should come to an agreement on this issue, otherwise it is going to come up again and again.
The fundamental issue is pretty straightforward - not sure why it's so difficult to come to an agreement.
- You want to mix and match vertex and fragment GL backends - The only maintainable way to do that is to define an interface between vertex and fragment objects
- You're concerned about the interface constraining the ability of fragment to talk to vertex - Write a smarter interface, I also suggested an object to manage it (linker).
You can't have it both ways - ability to mix and match backends, and unconstrained interface. The interface doesn't have to be "the lowest common denominator" - it could be the highest common denominator if properly written. As Henri pointed out the fact that both GL and the hardware are vertex/fragment aligned naturally suggests we should break up things the same way. Even the D3D programmable pipeline is broken up this way (there are Pixel and Vertex shader objects) - and the fixed pipeline is going away, so if anything we should move away from its interface.
- You want to replace the fixed pipeline using GL shaders. - Then replace the fixed pipeline - this has nothing to do with the programmable pipeline, therefore should not affect any programmable pipeline interfaces.
You've pointed out that shader_backend_t is more aligned to a GL extension than it is to a D3D shader. If that's the case I see little value to having that interface - it will just cause confusion as unrelated functionalities are packed into it. The disparity between GL extensions will get greater as we try to add new features using a particular extension, and we'll see a lot of functions that are just empty forwards to other "shader backends", since they don't support something themselves.
Ivan
Am Samstag, 12. April 2008 04:27:01 schrieb Ivan Gyurdiev:
Stefan Dösinger wrote:
Alexandre didn't commit the patch, I think we should come to an agreement on this issue, otherwise it is going to come up again and again.
The fundamental issue is pretty straightforward - not sure why it's so difficult to come to an agreement.
- You want to mix and match vertex and fragment GL backends - The only maintainable way to do that is to define an interface
between vertex and fragment objects
I certainly see the advantages of a constrained interface, I just don't see(and still don't see) how it can be designed cleanly without greatly limiting functionality of the pipeline / shader implementation.
I discussed the topic with Henri on IRC again(@Henri: Please correct me if I missunderstood you), and he explained that his plans consider making GLSL vertex shaders, GLSL vertex replacement, GLSL fragment replacement and GLSL fragment replacement one object with different interfaces. So we can have backchannel communication between the various interface implementations(e.g. flags or private data) which keeps everything flexible. It's not precisely nice(backchannel communication somewhat defeats the point of interfaces), but my design has ugliness as well, so I can live with that.
What's most important to me about this is that we don't have to close any bug as WONTFIX due to design constraints. So I can stop feeling strongly against splitting up the interface since the implementations remain the same.
Now the main issue with not splitting up the interfaces I see is that it is unclear what code in state.c(or the shader backend's state table) changes which GL state. If state_something changes both vertex and fragment GL states I can't overwrite it properly in the ATIFS/NVRC code without messing with the vertex side as well.
I have a few remaining issues though: -> With splitting up the state table there are now 5 "root" states which have to be polled for changes in CTXUSAGE_DRAWPRIM setting, also the GL_TEXTURE_SHADER_NV and GL_FRAGMENT_SHADER_ATI states need to be polled for enabling/disabling. Is there no way to avoid that?
-> Some cross pipeline part communication issues are still remaining, see below
-> Increased state dirtification complexity: Now each Set*State has to find out which part of the pipeline it has to dirtify(a switch-case statement or probably table referencing), and has to dirtify up to 3 pipelines. That's not precicely going to help performance.
I know that Ivan doesn't really care about that, especially since it's not an algorithmic complexity change. However, performance is a top priority issue, and no gamer will accept the next-gen hardware excuse for inefficient code.
I mainly want to avoid more bad PR like this: http://www.phoronix.com/scan.php?page=article&item=938 http://www.phoronix.com/scan.php?page=article&item=crossover_games
The 2nd article would be good for me if it was due to a tuneup in cxgames, but it is a regression in wine instead. Yes, that's phoronix and all, and the first article's regressions are technically perfectly explainable. Still new features don't have to come with performance costs. I have a ~5% performance regression myself which I don't know where it comes from.
So I propose the following plan:
1) We commit the patch to fix fglrx
2) We keep the shader / ffp interface as it is for Wine 1.0. We freeze in two weeks. I am away for one week now, so if anyone wants any shader interface changes in 1.0 he'll have to do it himself
3) We investigate where the recent performance regression(s) came from
4) We audit the state handlers in state.c and find out which D3D state handler changes which parts of the GL pipeline and find states that touch more than one part and why.
5) Build a battle plan how to separate the following D3D states in various GL pipeline parts:
-> Vertex shaders - streams. The tricky part here is that the fixed function GL vertex pipeline needs named arrays, while an ARB/GLSL vertex pipeline needs numbered arrays(otherwise no vertex blending emulation). How do we communicate the need for numbered arrays, and the choosen assignment?
-> vertex decl - loaded pointers. We're currently checking the vertex shader and fog states when a vertex buffer offset it changed, that is not needed
-> Samplers - GL_TEXTURE_xD enable - colorop. That's a major pain for ATIFS and even more NVTS. I haven't found a nicer implementation using split up interfaces
-> How do we deal with the depth blit shaders? Do they belong to the shader backend, or to something else?
-> Should we move the SetupForBlit to some of the shader code? The blitting and state switching might be more efficient if we're using shaders for it and just set a ARB / GLSL shader instead of falling back to the absolute low level limit and killing all states
-> Texture transform flags, clipping, more?
My ultimate hope is to have a clean assignment for each code referenced by the state table to the pipeline part, so instead of splitting up the state table in code we have a programming guidelines about which part of the state table may be changed by which pipeline replacement to avoid the additional run-time costs. If that turns out to be impossible we need a clear assignment anyway for a splitup.
6) Investigate the performance implications of the state management, state polling, the current conditional state linking state dirtification checks and the driver side cost of fixed function state changing while a
What are your opinions? (btw, I don't think I can implement that stuff alone anytime soon, I am pretty busy the next months)
Even the D3D programmable pipeline is broken up this way (there are Pixel and Vertex shader objects) - and the fixed pipeline is going away, so if anything we should move away from its interface.
One could argue that the name "pixel shader" already shows that Direct3D does not separate the pipeline parts properly(GL_NV_texture_shader, issue 1). Ironically pixel processing is one of the only remaining parts of Direct3D10 that is not programmable.
What are your opinions?
I can't comment on the specifics of the state table design - would have to leave that to Henri. However, anything that increases modularity is a good thing. I'm in favor of the overall direction to break up the pipeline into component stages. Smaller components are more easy to manage - specifically in your case this should allow separate backends to be chosen per pipeline stage.
It seems reasonable that these changes would be deferred until after 1.0.
Ivan
A small implementation change plan:
With Henri's suggestion, we have two levels of indirection, one that maps a D3D state to a pipeline part and a state, and one that maps a pipeline part and a state to an application function. Since both states are known at latest at device creation time, we can remove one indirection there.
So my suggestion is this: Instead of trying to keep the single table in code, add 3 pipeline stage backends, like Henri suggested: A vertex one, a fragment and a misc backend. The pipeline backend has a description structure which contains a priv data creation and destruction data, possibly some flags to tell other parts of the pipeline how to communicate with it and a set of states and application functions:
struct pipeline_backend { DWORD state; /* State this sets */ apply_func apply; /* Apply function like the current state mgmt uses */ DWORD representative;/* For state grouping */ }
state and representative work like the current state identifiers via STATE_RENDER(x), STATE_TEXTURESTAGE(x, y), etc. The apply function takes the stateblock, state and context as argument, as usual.
The current global state table(s) are removed. At device creation(or Init3D or somewhere else) we select a vertex, fragment and misc state backend and a shader backend. The device contains a full state table like the current FFPStateTable / ATIFSStateTable. A pipeline compiler(is there some better name?) iterates over the 3 partial state tables and inserts them in the device's state table. If a state is handled by more than one pipeline part, a helper function can be used which calls the callbacks in a row. That's not as efficient as the current inlining, but it should be at least as fast as in Henri's proposal, and I can live with that.
Advantages: -> We can split up pipeline part handlers and select them dynamically etc -> We have no additional overhead in the rendering loop since we only deal with one state table there -> Minimal state polling because pixel and vertex shaders are dirtifyable states
Problems: -> The issues from the last mail still apply and need to be solved(not a problem, just mentioning)
-> Shaders not 100% separated from the fixed function pipeline since both are equal states in the vertex and fragment pipelines
-> No state handler or shader_select is not guaranted to be applied each draw, so we can't use them to enable GL_FRAGMENT_SHADER_ATI and GL_TEXTURE_SHADER_ATI. That will have to be a fixed function fragment pipeline callback. If there's a shader implementation using that extension in use as well, they can sort this out via private data sharing. (Note: we would get away with using the colorop setter for this, because after device creation this state is dirty, after a blit it is dirty, and otherwise the extension will be enabled anyway. I don't particularly like that as it is a rather fragile setup)
-> What do we do if a state has different representatives in different pipeline parts? E.g. the vdecl and vertex shader will be linked in both misc and vertex pipeline backend. In the misc backend, the stream sources will be added there as well. This means that if a stream source is changed the VDECL fog/lighting changes will be performed needlessly(it happens currently as well, so it won't be a regression). There may be other such state groups, but I can't think of any right now.
Does that sound like a reasonable idea?