On Tue, Aug 21, 2012 at 10:52 PM, gurketsky gurketsky@googlemail.comwrote:
I just like to present the state of the ID3DXConstantTable implementation, so that possibly no work is done twice. This goes specifically to Józef. I'm not sure what's the plan on this. There are two problems which arise and I did not had time to sort those out, yet.
Thanks for notifying me. I was about to write some tests for structures in constant tables. I see that you've already written such tests.
Cheers, Józef Kucia
On 22.08.2012 12:23, Józef Kucia wrote:
On Tue, Aug 21, 2012 at 10:52 PM, gurketsky <gurketsky@googlemail.com mailto:gurketsky@googlemail.com> wrote:
I just like to present the state of the ID3DXConstantTable implementation, so that possibly no work is done twice. This goes specifically to Józef. I'm not sure what's the plan on this. There are two problems which arise and I did not had time to sort those out, yet.
Thanks for notifying me. I was about to write some tests for structures in constant tables. I see that you've already written such tests.
Cheers, Józef Kucia
Well, I just had a closer look again. My speed test triggers a problem, so it's not really comparable. But it looks like native only allocates handles, if it really needs them. So I'm not sure we like to go the same approach or if it is fine allocating them all like I've done that. I fixed the test and native speed advantage was blown away. Although, if that's fine with your opinion, you could reuse the code or I could clean it up a bit and send it. I haven't send it and improved it, yet, because the problem with solution 1. or 2. isn't solved for me. I'm a little bit against version 1., because I think speed might be an issue and no one had a technical argument against 2. I don't think a extra handle list for handles with "small" values is the way to got, because it may also hit the values where strings could be in memory. The extra handle list would be 3., but I think it needs a lot more memory for just adding a "layer" for checking the handles. Thus 2. is a lot better than 3. (which I haven't explained in detail).
To tackle the problem: 1. The handle and table mixing could be worked around by using a global list for all tables and searching the handles in there. That should be easy to add. The problem I see with that, might be speed related, when we have a lot of handles, searching the list will be slow. Through, I'm fine with that solution and would add it. It will show, if it really is so slow...
2. But the other solution isn't dead for me, yet. I had another look at the D3DXHANDLE usage and the question what the hell is D3DXCONSTTABLE_LARGEADDRESSAWARE used for? It was said, it's bad and broken, but I haven't seen why, just that it is ugly, but I couldn't see a technical reason not to do so. What's specifically the problem:
The argument is, that D3DXHANDLES are distinguished from strings by using the highest bit (bit #31). Thus with LARGEADDRESSAWARE the usage of stings as D3DXHANDLEs is not allowed anymore (see http://msdn.microsoft.com/en-us/library/windows/desktop/bb943959%28v=vs.85%2...). Also this would speed up the detection of a handle dramatically, well it doesn't check for validness, but native doesn't do that, if you pass a garbled handle, it will crash.
D3DXHANDLE handle_from_constant(struct ctab_constant *constant) { if (largeadressaware && constant) return (D3DXHANDLE)constant; if (constant) return (D3DXHANDLE)((UINT_PTR)constant | 0x80000000); return NULL; } struct ctab_constant *is_valid_constant(struct ID3DXConstantTableImpl *table, D3DXHANDLE handle) { if (largeadressaware) return (struct ctab_constant *)handle; if ((UINT_PTR)handle >> 31) return (struct ctab_constant *)((UINT_PTR)handle & 0x7fffffff); return get_constant_by_name(table, NULL, handle); }
According to http://en.wikipedia.org/wiki/Virtual_address_space: 32bit on 32bit without LARGEADDRESSAWARE: has only 2gb (default 32bit) 32bit on 64bit without LARGEADDRESSAWARE: has only 2gb (default 32bit) 64bit on 64bit without LARGEADDRESSAWARE: has only 2gb 32bit on 32bit with LARGEADDRESSAWARE: has 3gb 32bit on 64bit with LARGEADDRESSAWARE: has 4gb 64bit on 64bit with LARGEADDRESSAWARE: has 8tb (default 64bit)
So in cases, where the exe is linked with LARGEADDRESSAWARE, d3dx9 would have to be used with D3DXCONSTTABLE_LARGEADDRESSAWARE. That way it's the same for os with 32bit and 64bit. The only problem I see, nowhere is said, that the 2gb will always be the lowest 2gb. But my tests showed, that I always get the lower 31bit of addresses in my test runs when allocating memory. Thus I'm very unlucky by not getting a higher address or this might be the way it works on windows. Has anyone a technical argument against this solution?
I hope this helps you to make the correct decisions.
Cheers Rico
On Wed, Aug 22, 2012 at 2:45 PM, Rico Schüller kgbricola@web.de wrote:
Well, I just had a closer look again. My speed test triggers a problem, so it's not really comparable. But it looks like native only allocates handles, if it really needs them. So I'm not sure we like to go the same approach or if it is fine allocating them all like I've done that. I fixed the test and native speed advantage was blown away. Although, if that's fine with your opinion, you could reuse the code or I could clean it up a bit and send it. I haven't send it and improved it, yet, because the problem with solution 1. or 2. isn't solved for me. I'm a little bit against version 1., because I think speed might be an issue and no one had a technical argument against 2. I don't think a extra handle list for handles with "small" values is the way to got, because it may also hit the values where strings could be in memory. The extra handle list would be 3., but I think it needs a lot more memory for just adding a "layer" for checking the handles. Thus 2. is a lot better than 3. (which I haven't explained in detail).
I would prefer you to clean it up submit it. I hope it gets committed this time.
So in cases, where the exe is linked with LARGEADDRESSAWARE, d3dx9 would have to be used with D3DXCONSTTABLE_LARGEADDRESSAWARE. That way it's the same for os with 32bit and 64bit. The only problem I see, nowhere is said, that the 2gb will always be the lowest 2gb. But my tests showed, that I always get the lower 31bit of addresses in my test runs when allocating memory. Thus I'm very unlucky by not getting a higher address or this might be the way it works on windows. Has anyone a technical argument against this solution?
It seems fine to me. A program compiled without *LARGEADDRESSAWARE* should get all the memory allocated below the 2 GB limit (see http://msdn.microsoft.com/en-us/library/windows/desktop/aa384271%28v=vs.85%2... ).
On 23.08.2012 15:43, Józef Kucia wrote:
I would prefer you to clean it up submit it. I hope it gets committed this time.
Ok, I'll try to clean them and send them. I will do it the safe way and compare each handle with all handles we have. If it is slow, we could easily move by using the highest bit for comparison against the D3DXHANDLE. We just have to take care that an easy switch is possible.
Thanks for the input. Rico
On 23.08.2012 22:58, Rico Schüller wrote:
On 23.08.2012 15:43, Józef Kucia wrote:
I would prefer you to clean it up submit it. I hope it gets committed this time.
Ok, I'll try to clean them and send them. I will do it the safe way and compare each handle with all handles we have. If it is slow, we could easily move by using the highest bit for comparison against the D3DXHANDLE. We just have to take care that an easy switch is possible.
Thanks for the input. Rico
Patches are in git now. The normal way should work fine. The following things should be considered (but I have no patches for these):
1. The cross calling (table1, variable_from_table2). 2. We may avoid the desc copy, because the variables may be set often and so the copy time will make some speed difference in set_*_array and GetSamplerIndex. Example for GetSamplerIndex:
res = ID3DXConstantTable_GetConstantDesc(iface,constant,&desc,&count); if (FAILED(res)) return (UINT)-1; if (desc.RegisterSet != D3DXRS_SAMPLER)
to something like:
struct ctab_constant *c = get_valid_constant(This, constant); if (c->desc.RegisterSet != D3DXRS_SAMPLER)
3. The wine_todo should be fixed in the test. Is there a way to disable them to show up, when running e.g. "wine d3dx9_36_test.exe.so shader"? It's a bit annoying when you search for your own failing tests. Well I could comment out the tests, but that's also not a very fine solution.
Cheers Rico
On 28 August 2012 09:12, Rico Schüller kgbricola@web.de wrote:
- The wine_todo should be fixed in the test. Is there a way to disable them
to show up, when running e.g. "wine d3dx9_36_test.exe.so shader"? It's a bit annoying when you search for your own failing tests. Well I could comment out the tests, but that's also not a very fine solution.
You can set WINETEST_PLATFORM=wine, but the easiest is probably to just do "make shader.ok" instead of trying to run the test manually.
On 28.08.2012 10:50, Henri Verbeet wrote:
On 28 August 2012 09:12, Rico Schüller kgbricola@web.de wrote:
- The wine_todo should be fixed in the test. Is there a way to disable them
to show up, when running e.g. "wine d3dx9_36_test.exe.so shader"? It's a bit annoying when you search for your own failing tests. Well I could comment out the tests, but that's also not a very fine solution.
You can set WINETEST_PLATFORM=wine, but the easiest is probably to just do "make shader.ok" instead of trying to run the test manually.
Thanks, that's exactly what I need.
Cheers Rico
On 28.08.2012 10:50, Henri Verbeet wrote:
On 28 August 2012 09:12, Rico Schüller kgbricola@web.de wrote:
- The wine_todo should be fixed in the test. Is there a way to disable them
to show up, when running e.g. "wine d3dx9_36_test.exe.so shader"? It's a bit annoying when you search for your own failing tests. Well I could comment out the tests, but that's also not a very fine solution.
You can set WINETEST_PLATFORM=wine, but the easiest is probably to just do "make shader.ok" instead of trying to run the test manually.
Well, I found an other way, fixing the test seems also a solution :-) .
But I stumble over a problem: How could I create shader variables which uses D3DXRS_INT4 or D3DXRS_BOOL as register set? My goal is to check different combinations of the RegisterSet and the Type (e.g. D3DXRS_FLOAT4 and D3DXPT_FLOAT, D3DXRS_INT4 and D3DXPT_FLOAT, D3DXRS_BOOL and D3DXPT_FLOAT, D3DXRS_INT4 and D3DXPT_INT, D3DXRS_BOOL and D3DXPT_INT, ...). I have an improved version of set_float_matrix but I've no glue how to check if the assumption is correct, it fixes one todo but I'd like to test the rest if possible. Creating variables with D3DXRS_FLOAT4 or D3DXRS_SAMPLER is no problem. Anyone an idea?
Attached is a patch. I changed some more stuff (not only the set_float_matrix function), but the main goal is a test for that one. I'd also be happy if someone has some criticism or suggestions to the rest in the patch.
Cheers Rico
Rico Schüller kgbricola@web.de writes:
So in cases, where the exe is linked with LARGEADDRESSAWARE, d3dx9 would have to be used with D3DXCONSTTABLE_LARGEADDRESSAWARE.
Yes, but in practice apps don't do that. Eve Online was reportedly one of the offenders, there are certainly others.