Hi,
I don't know much about the current shader code but is this the right way to go thinking about GL_EXT_texture_swizzle + GL_ARB_texture_rg? For instance Nvidia drivers will soon get the swizzle extension and for that you don't have to fixups anymore. Is moving this way the right approach, so that it won't become a mess later on?
Roderick
2008/11/27 Roderick Colenbrander thunderbird2k@gmx.net:
Hi,
I don't know much about the current shader code but is this the right way to go thinking about GL_EXT_texture_swizzle + GL_ARB_texture_rg? For instance Nvidia drivers will soon get the swizzle extension and for that you don't have to fixups anymore. Is moving this way the right approach, so that it won't become a mess later on?
I'm not really convinced either. I would prefer something more descriptive, (like the way EXT_texture_swizzle handles this, for example), rather than listing every combination we use in an enum.
I'm not really convinced either. I would prefer something more descriptive, (like the way EXT_texture_swizzle handles this, for example), rather than listing every combination we use in an enum.
I don't think that gains us a lot(and how do you describe something like YUV conversions?) These texture fixups apply to the shader, while GL_EXT_texture_swizzle applies to the texture.
With GL_EXT_texture_swizzle we would not set any shader conversion flag for most formats(except YUV ones) and instead configure the texture_swizzle in basetexture.c when creating a new texture name.
2008/11/27 Stefan Dösinger stefan@codeweavers.com:
I'm not really convinced either. I would prefer something more descriptive, (like the way EXT_texture_swizzle handles this, for example), rather than listing every combination we use in an enum.
I don't think that gains us a lot(and how do you describe something like YUV conversions?) These texture fixups apply to the shader, while GL_EXT_texture_swizzle applies to the texture.
It's not about gaining stuff, it's about creating an interface that makes sense, even in the long term. If I look at the shader_color_conv enum, I might be able to guess what each element is supposed to do, but I don't really know until I look at what formats it's used with, and how GLSL and ARB implement that specific conversion. That's bad if I would want to implement a shader backend, but it's also bad if I want to add support for a new format that needs this kind of conversion. I would need to look through the list and implementation to see if there's a conversion that does what I want, and if there isn't, add an element to the list and implement it in all backends. Describing the operation that needs to be performed on each component on the other hand would allow both the implementation and the code using it to be mostly generic.
It's not about gaining stuff, it's about creating an interface that makes sense, even in the long term. If I look at the shader_color_conv enum, I might be able to guess what each element is supposed to do, but I don't really know until I look at what formats it's used with, and how GLSL and ARB implement that specific conversion. That's bad if I would want to implement a shader backend, but it's also bad if I want to add support for a new format that needs this kind of conversion. I would need to look through the list and implementation to see if there's a conversion that does what I want, and if there isn't, add an element to the list and implement it in all backends. Describing the operation that needs to be performed on each component on the other hand would allow both the implementation and the code using it to be mostly generic.
I agree in theory, but I doubt that this works in practise. To understand what a more fine-grained interface does you still have to read the code or the docs(e.g. the EXT_texture_swizzle spec, although we can't use that as-is).
Issue 1: I want to pack the color fixup information as tight as possible. A format like EXT_texture_swizzle needs more space for possible combinations we'll never use. My current enum needs 4 bits of storage.
Issue 2: The code that reads this information. If we want to generate efficient shader code that does the fixup in ideally one line of ARB asm we'll have to have lots of if checks for specific setups, and when we add a new fixup we have to add specialized code to each shader backend anyway. Keep in mind that we're not only reading RGBA components and 1.0 and 0.0 but also have the sign correction.
Issue 3: How do we compare two conversions efficiently? As long as we can pack it into a DWORD that works, but anything beyond that will get tricky(otherwise we could e.g. provide the color fixup as D3D ASM, that would avoid issue 2)
Issue 4: How do you describe YUV conversions? I don't want to move that into a different data structure because in the future we may want the blitting code to be able to handle non-YUV conversions too(e.g. R32F -> ARGB32F blits)
So in theory it sounds good, but I don't see what a nice implementation would look like. I'd be happy to be prooven wrong though.
2008/11/27 Stefan Dösinger stefan@codeweavers.com:
Issue 1: I want to pack the color fixup information as tight as possible. A format like EXT_texture_swizzle needs more space for possible combinations we'll never use. My current enum needs 4 bits of storage.
Why do you think this is so important?
Issue 2: The code that reads this information. If we want to generate efficient shader code that does the fixup in ideally one line of ARB asm we'll have to have lots of if checks for specific setups, and when we add a new fixup we have to add specialized code to each shader backend anyway. Keep in mind that we're not only reading RGBA components and 1.0 and 0.0 but also have the sign correction.
So you've got 4 swizzles, 1.0, 0.0 and a sign conversion. Should be easy in GLSL. ARB might be slightly harder, but I don't see how it can be as hard as you describe.
Issue 3: How do we compare two conversions efficiently? As long as we can pack it into a DWORD that works, but anything beyond that will get tricky(otherwise we could e.g. provide the color fixup as D3D ASM, that would avoid issue 2)
If you really care deeply about memory usage, you can encode the entire conversion in 15 bits. If efficient decoding is also a requirement you'll probably need 16 bits.
Issue 4: How do you describe YUV conversions? I don't want to move that into a different data structure because in the future we may want the blitting code to be able to handle non-YUV conversions too(e.g. R32F -> ARGB32F blits)
YUV will be a special case either way, but note that if you go for the 16 bit encoding you'll have plenty of bits left to stuff some YUV ugliness in. Note that I'm not entirely convinced of the need for YUV conversion in shaders. Are there any actual applications that use any of the YUV formats with d3d8 or d3d9? There probably are for ddraw, but you don't have shaders there. D3d10 seems to have dropped the format completely. The fact that the YUV code currently breaks the d3d9 visual tests with FBO offscreen rendering mode and that you don't intend to fix that doesn't particularly help make me care.
So in theory it sounds good, but I don't see what a nice implementation would look like. I'd be happy to be prooven wrong though.
I'd be happy to have a shot, should the implementation prove challenging.
2008/11/27 Stefan Dösinger stefan@codeweavers.com:
Issue 1: I want to pack the color fixup information as tight as
possible. A format like EXT_texture_swizzle needs more space for possible combinations we'll never use. My current enum needs 4 bits of storage.
Why do you think this is so important?
Because we have to verify the color fixup on 16+3 samplers each draw, and with d3d10 there will be many more(Of course in d3d10 we may be lucky and have all the formats natively in opengl without the need for shader conversion) My automated performance tests show a small, but reproducible performance drop since my multiple pixel shader patches in some games. I have not yet checked why exactly, but I suspect its that the collection of all the stateblock data and the memcmp over the currently 72 byte ps_compile_args structure simply takes longer than the old CompileShader check. If we keep the color conversion information compact we can bring this structure down to 10 bytes.
So you've got 4 swizzles, 1.0, 0.0 and a sign conversion. Should be easy in GLSL. ARB might be slightly harder, but I don't see how it can be as hard as you describe.
It's not really "hard", but I don't see how the whole code will look nicer in the end.
If you really care deeply about memory usage, you can encode the entire conversion in 15 bits. If efficient decoding is also a requirement you'll probably need 16 bits.
That's 4 times as much as 4 bits... I don't care about efficient decoding, but 15 bits isn't a helpful number so we'd have one spare bit anyway.
YUV will be a special case either way, but note that if you go for the 16 bit encoding you'll have plenty of bits left to stuff some YUV ugliness in. Note that I'm not entirely convinced of the need for YUV conversion in shaders. Are there any actual applications that use any of the YUV formats with d3d8 or d3d9? There probably are for ddraw, but you don't have shaders there. D3d10 seems to have dropped the format completely. The fact that the YUV code currently breaks the d3d9 visual tests with FBO offscreen rendering mode and that you don't intend to fix that doesn't particularly help make me care.
Quicktime uses YUV formats in d3d9. It doesn't do any D3D geometry drawing with it(no windows driver supports that it seems), just blitting. I don't want to be able to do D3D drawing with YUV textures, but I'd like to be able to use the D3D shader conversion code with blitting should that become necessary.
And yes, YUV blitting is broken with FBOs, but that's a problem in the way the blitting codepath selection works, not the color conversion.
2008/11/27 Stefan Dösinger stefan@codeweavers.com:
Because we have to verify the color fixup on 16+3 samplers each draw, and with d3d10 there will be many more(Of course in d3d10 we may be lucky and have all the formats natively in opengl without the need for shader conversion) My automated performance tests show a small, but reproducible performance drop since my multiple pixel shader patches in some games. I have not yet checked why exactly, but I suspect its that the collection of all the stateblock data and the memcmp over the currently 72 byte ps_compile_args structure simply takes longer than the old CompileShader check. If we keep the color conversion information compact we can bring this structure down to 10 bytes.
It will likely be more efficient to only store and compare the samplers that need a fixup in the first place then. For most shaders you'll end up only checking a bitmask against zero in that case. Note that with a 16bit encoding you'd still cut the 64 bytes the array currently uses in half, in which case the structure would easily fit into a typical cacheline, if that's even the issue.
Quicktime uses YUV formats in d3d9. It doesn't do any D3D geometry drawing with it(no windows driver supports that it seems), just blitting. I don't want to be able to do D3D drawing with YUV textures, but I'd like to be able to use the D3D shader conversion code with blitting should that become necessary.
Fair enough.
It will likely be more efficient to only store and compare the samplers that need a fixup in the first place then. For most shaders you'll end up only checking a bitmask against zero in that case. Note that with a 16bit encoding you'd still cut the 64 bytes the array currently uses in half, in which case the structure would easily fit into a typical cacheline, if that's even the issue.
This looks like a nice idea. I don't know how we would catch a case where e.g. a sampler switches between two different formats that need conversion.
To get back to the original discussion though, I don't think we gain anything in terms of code readability(special case checking in arb), and if we need some YUV uglyness the interface doesn't look prettier. The other problem is that not all shader backends can do all the conversions, so we get problems with the shader / fragment pipeline implementation's is-this-conversion-supported callback(needs special case checking too). (e.g. in atifs the sign fixup is supported and free, but other fixups may need up to 4 instructions per texture read, which is 1/4th of the shader already. I don't know what that would look like in nvts, but we currently don't need any color fixups there, so we probably never will)
2008/11/27 Stefan Dösinger stefan@codeweavers.com:
It will likely be more efficient to only store and compare the samplers that need a fixup in the first place then. For most shaders you'll end up only checking a bitmask against zero in that case. Note that with a 16bit encoding you'd still cut the 64 bytes the array currently uses in half, in which case the structure would easily fit into a typical cacheline, if that's even the issue.
This looks like a nice idea. I don't know how we would catch a case where e.g. a sampler switches between two different formats that need conversion.
You'd still check the actual fixups, it just means that in most cases you only have to check the bitmask, and for the cases where a fixup *is* needed you only need to compare the samplers with a fixup. So if you've got a single sampler with fixup you only need to check 16 bits, if you've got two 32 bits, etc.
To get back to the original discussion though, I don't think we gain anything in terms of code readability(special case checking in arb), and if we need some YUV uglyness the interface doesn't look prettier. The other problem is that not all shader backends can do all the conversions, so we get problems with the shader / fragment pipeline implementation's is-this-conversion-supported callback(needs special case checking too). (e.g. in atifs the sign fixup is supported and free, but other fixups may need up to 4 instructions per texture read, which is 1/4th of the shader already. I don't know what that would look like in nvts, but we currently don't need any color fixups there, so we probably never will)
Maybe, but you'll need to check if the conversion is supported either way. Using one format or another for describing the conversion doesn't change much there. YUV ugliness is relative, it basically comes down to using the extra two values in each component to describe the type of YUV format. I do think it is important to have sane interfaces, even if they're mostly internal ones.
I forgot to mention this in my previous mail, but if we're going to change the color fixup code, we should really integrate it properly into the backend. Ie, shader_glsl_get_sample_function() and the ARB equivalent should be aware of the format conversion, in order to prevent having to swizzle from components that aren't available, like can happen with WINED3DFMT_L6V5U5.
I forgot to mention this in my previous mail, but if we're going to change the color fixup code, we should really integrate it properly into the backend. Ie, shader_glsl_get_sample_function() and the ARB equivalent should be aware of the format conversion, in order to prevent having to swizzle from components that aren't available, like can happen with WINED3DFMT_L6V5U5.
Agreed. Although I think there was a reason why we did not do that when we implemented the current structure of the color conversion code to replace a hacky ARB integration of the color conversion.